#Decode

1개의 포스트

[논문리뷰] Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

본 논문은 decoder-only 모델에서 long-context 추론 시 발생하는 Prefill 단계의 높은 계산 비용과 Decode 단계의 KV-cache 메모리 대역폭 한계를 해결하고자 합니다.

#Review #Long-Context Inference #KV-Cache #Phase-Asymmetric #Prefill #Decode #Transformer

2026년 5월 10일