#Dynamic Top-pp Selection

1개의 포스트

[논문리뷰] Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

본 논문은 Long-context 추론 시 발생하는 full attention의 이차 비용(quadratic cost) 문제를 해결하기 위해 효율적인 스파스(sparse) 구조로의 전환을 제안한다.

#Review #Long-context LLM #Sparse Attention #Head Specialization #Dynamic Top-pp Selection #Efficient Inference #Self-distillation

2026년 5월 21일