#Attention Score Prediction

1개의 포스트

[논문리뷰] LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

최근 LLM의 Context Length가 급증하면서 KV Cache 의 크기가 입력 시퀀스 길이에 비례하여 선형적으로 증가하며, 이는 long-context task 에서 메모리 병목 현상을 야기하여 inference scalability에 큰 제약을 초래하고 있습니다.

#Review #KV Cache Eviction #Long Context LLM #Attention Score Prediction #LoRA #Parameter-Efficient #Time-to-First-Token

2026년 3월 15일