#Disaggregated Inference

2개의 포스트

[논문리뷰] KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

본 논문은 Disaggregated LLM Serving 환경에서 KV cache 통신이 전체 end-to-end 지연시간의 최대 60%를 차지하는 주요 병목 현상을 해결하고자 한다 .

#Review #LLM Serving #KV Cache Compression #Disaggregated Inference #Bayesian Optimization #Service-Aware Control

2026년 5월 21일

[논문리뷰] RDMA Point-to-Point Communication for LLM Systems

LLM 시스템에서 필요한 유연한 지점 간 통신(point-to-point communication) 을 제공하고, 기존 RDMA 구현이 특정 NIC(Network Interface Controller) 에 종속되어 발생하는 벤더 종속성(vendor lock-in) 및 하드웨어 이식성(portability) 문제를 해결하는 것을 목표로 합니다.

#Review #RDMA #LLM #Point-to-Point Communication #Disaggregated Inference #MoE Routing #KvCache #AWS EFA #NVIDIA ConnectX

2025년 11월 9일