#Service-Aware Control

1개의 포스트

[논문리뷰] KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

본 논문은 Disaggregated LLM Serving 환경에서 KV cache 통신이 전체 end-to-end 지연시간의 최대 60%를 차지하는 주요 병목 현상을 해결하고자 한다 .

#Review #LLM Serving #KV Cache Compression #Disaggregated Inference #Bayesian Optimization #Service-Aware Control

2026년 5월 21일