#Long-Context LLM

4개의 포스트

[논문리뷰] CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

본 논문은 기존 Chunked Prefill 환경에서 Block-Sparse Attention 및 Query-Subsampled KV Selection 방식이 가진 성능 한계를 극복하기 위해 CompactAttention을 제안합니다.

#Review #Chunked Prefill #KV Selection #Block-Sparse Attention #Paged Attention #Zero-Copy Execution #Long-Context LLM

2026년 5월 18일

[논문리뷰] UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

본 논문은 기존의 prefill 가속 기법들이 최신 하이브리드 LLM 아키텍처와 연속 배치(continuous batching) 환경에 부적합하다는 문제를 해결합니다.

#Review #Long-Context LLM #Prefill Acceleration #Dynamic Sparsification #Hybrid Architectures #Continuous Batching #vLLM

2026년 5월 10일

[논문리뷰] Prism: Spectral-Aware Block-Sparse Attention

긴 컨텍스트를 처리하는 LLM의 pre-filling 과정 을 가속화하기 위한 블록-희소 어텐션(block-sparse attention)의 효율적인 블록 중요도 추정 문제를 해결하는 것이 목표입니다.

#Review #Block-Sparse Attention #Long-Context LLM #Rotary Positional Embeddings #Spectral Analysis #Attention Efficiency #Pre-filling Acceleration

2026년 2월 10일

[논문리뷰] Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

본 논문은 기존의 Softmax Attention 이 긴 시퀀스 길이에서 겪는 계산 및 I/O 오버헤드 문제 를 해결하고, 순수 Linear Attention 모델의 성능 한계를 극복하기 위해 효율적인 하이브리드 아키텍처를 제안합니다.

#Review #Long-Context LLM #Hybrid Attention #Linear Attention #Mixture-of-Experts #FP8 Training #GPU Optimization #Training-Inference Alignment #Reinforcement Learning

2025년 10월 23일