#Multi-turn Conversation

4개의 포스트

[논문리뷰] FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

최근 Large Language Models (LLMs)는 금융 애플리케이션에서 agent 로서 사용자 요청을 해석하고, 외부 도구를 호출하며, 다단계 추론을 수행해야 하는 역할을 점점 더 많이 맡고 있습니다.

#Review #LLM Agents #Financial Tool Use #Benchmarking #Model Context Protocol #Multi-tool Reasoning #Multi-turn Conversation #Evaluation Metrics

2026년 3월 26일

[논문리뷰] CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark

본 논문은 웨어러블 AI 시나리오를 위한 Multi-Modal Retrieval-Augmented Generation (MM-RAG) 시스템의 포괄적인 평가를 위한 벤치마크가 부족하다는 문제를 해결합니다.

#Review #Multi-modal RAG #Benchmark #Wearable AI #Multi-turn Conversation #Egocentric Images #Knowledge Graph #Web Search #Hallucination

2025년 10월 31일

[논문리뷰] ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

논문은 다중 턴 대화에서 Large Language Models (LLMs) 의 성능이 저하되는 문제를 해결하는 것을 목표로 합니다. 특히, 정보가 점진적으로 주어질 때 LLM이 대화 맥락을 '잃어버려' 발생하는 정확도 감소 및 신뢰성 하락을 개선하고자 합니다.

#Review #Multi-turn Conversation #Large Language Models (LLMs)#Context Management #Entropy-guided Resetting #Uncertainty Quantification #Performance Degradation #Prompt Engineering #Conversational AI

2025년 10월 20일

[논문리뷰] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

기존 LLM 에이전트 벤치마크들이 실제 환경의 복잡성(방대한 정보 처리, 다양한 리소스 활용, 동적인 사용자 상호작용)을 제대로 포착하지 못하는 문제를 해결합니다. 본 논문은 VitaBench 를 통해 현실 세계의 다양한 시뮬레이션 환경에서 에이전트의 능력을 평가하고, 이러한 격차를 해소하는 것을 목표로 합니다.

#Review #LLM Agents #Benchmarking #Interactive Tasks #Real-world Applications #Tool Use #Multi-turn Conversation #Task Complexity

2025년 10월 1일