#Block-Sparse Attention

3개의 포스트

[논문리뷰] CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

본 논문은 기존 Chunked Prefill 환경에서 Block-Sparse Attention 및 Query-Subsampled KV Selection 방식이 가진 성능 한계를 극복하기 위해 CompactAttention을 제안합니다.

#Review #Chunked Prefill #KV Selection #Block-Sparse Attention #Paged Attention #Zero-Copy Execution #Long-Context LLM

2026년 5월 18일

[논문리뷰] Prism: Spectral-Aware Block-Sparse Attention

긴 컨텍스트를 처리하는 LLM의 pre-filling 과정 을 가속화하기 위한 블록-희소 어텐션(block-sparse attention)의 효율적인 블록 중요도 추정 문제를 해결하는 것이 목표입니다.

#Review #Block-Sparse Attention #Long-Context LLM #Rotary Positional Embeddings #Spectral Analysis #Attention Efficiency #Pre-filling Acceleration

2026년 2월 10일

[논문리뷰] Sparser Block-Sparse Attention via Token Permutation

본 논문은 LLM에서 긴 컨텍스트 길이 처리 시 O(N^2) 복잡도 를 가진 self-attention 메커니즘 으로 인한 막대한 계산 비용과 메모리 병목 현상을 해결하고자 합니다.

#Review #Large Language Models (LLMs)#Self-Attention #Block-Sparse Attention #Token Permutation #Computational Efficiency #Prefilling #Long Context #Causal Attention

2025년 10월 27일