#Chunked Prefill

2개의 포스트

[논문리뷰] CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

본 논문은 기존 Chunked Prefill 환경에서 Block-Sparse Attention 및 Query-Subsampled KV Selection 방식이 가진 성능 한계를 극복하기 위해 CompactAttention을 제안합니다.

#Review #Chunked Prefill #KV Selection #Block-Sparse Attention #Paged Attention #Zero-Copy Execution #Long-Context LLM

2026년 5월 18일

[SGLang] Continuous Batching & Chunked Prefill: 동적 배칭의 핵심

SGLang의 Continuous Batching과 Chunked Prefill을 분석한다. 요청이 끝나는 즉시 새 요청을 채우는 동적 배칭, 긴 프롬프트를 청크 단위로 분할하는 전략을 코드와 함께 살펴본다.

#sglang #Continuous Batching #Chunked Prefill #Dynamic Batching

2026년 4월 10일