#Multi-LLM Systems

2개의 포스트

[논문리뷰] Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey

John D. Kelleher이 arXiv에 게시한 'Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Inference #Model Routing #Model Cascading #Efficiency Optimization #Dynamic Model Selection #Multi-LLM Systems #Cost-Performance Trade-off #Adaptive AI Systems

2026년 3월 8일

[논문리뷰] Cache-to-Cache: Direct Semantic Communication Between Large Language Models

arXiv에 게시된 'Cache-to-Cache: Direct Semantic Communication Between Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models (LLMs)#Inter-model Communication #KV-Cache #Semantic Transfer #Multi-LLM Systems #Cache Fusion #Latency Reduction #Knowledge Sharing

2025년 10월 9일