#Streaming Knowledge

1개의 포스트

[논문리뷰] Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents

본 논문은 클라우드 기반 LLM의 높은 추론 능력과 온디바이스 모델의 즉각적인 반응성이라는 이중적 요구사항을 동시에 만족하기 위한 새로운 하이브리드 아키텍처를 제안합니다.

#Review #Conversational Infill #On-device AI #Model Collaboration #Latency #Streaming Knowledge #LLM

2026년 6월 28일