#KV Cache Eviction

2개의 포스트

[논문리뷰] LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

최근 LLM의 Context Length가 급증하면서 KV Cache 의 크기가 입력 시퀀스 길이에 비례하여 선형적으로 증가하며, 이는 long-context task 에서 메모리 병목 현상을 야기하여 inference scalability에 큰 제약을 초래하고 있습니다.

#Review #KV Cache Eviction #Long Context LLM #Attention Score Prediction #LoRA #Parameter-Efficient #Time-to-First-Token

2026년 3월 15일

[논문리뷰] TimeBill: Time-Budgeted Inference for Large Language Models

시간 제약이 있는 시스템(예: 로봇 공학, 자율 주행)에서 대규모 언어 모델(LLM)의 응답 성능을 유지하면서 주어진 시간 예산 내에 추론을 완료하는 문제를 해결하는 것이 목표입니다.

#Review #LLM Inference #Time Budgeting #KV Cache Eviction #Response Length Prediction #Execution Time Estimation #Real-time AI #Performance Optimization

2025년 12월 28일