#Response Length Prediction

1개의 포스트

[논문리뷰] TimeBill: Time-Budgeted Inference for Large Language Models

시간 제약이 있는 시스템(예: 로봇 공학, 자율 주행)에서 대규모 언어 모델(LLM)의 응답 성능을 유지하면서 주어진 시간 예산 내에 추론을 완료하는 문제를 해결하는 것이 목표입니다.

#Review #LLM Inference #Time Budgeting #KV Cache Eviction #Response Length Prediction #Execution Time Estimation #Real-time AI #Performance Optimization

2025년 12월 28일