#Similarity Alignment

1개의 포스트

[논문리뷰] M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark

본 연구는 기존 LLM 도구 사용 벤치마크 들이 주로 텍스트 기반이고 선형적인 API 계획 에 초점을 맞추는 한계를 넘어, 멀티모달 LLM(MLLM) 에이전트 의 실제와 같은 도구 사용 능력을 평가하기 위한 첫 번째 벤치마크인 M³-Bench 를 제안합니다.

#Review #Multimodal LLM #Tool Use #Agent Benchmark #Model Context Protocol #Multi-Hop Reasoning #Multi-Threaded Execution #Evaluation Metrics #Similarity Alignment

2025년 11월 24일