#Real-world tasks

1개의 포스트

[논문리뷰] LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

본 논문은 기존 도구 사용 벤치마크가 시뮬레이션되거나 소규모의 MCP(Model Context Protocol) 서버에 국한되어 실제 대규모의 동적인 환경을 반영하지 못하는 한계를 지적합니다.

#Review #LLM Agent #Tool-use #MCP #Benchmark #Large-scale #Real-world tasks #Automated Evaluation #Meta-tool-learning

2025년 8월 6일