#Model Context Protocol (MCP)

8개의 포스트

[논문리뷰] Terminal Agents Suffice for Enterprise Automation

저자들은 터미널과 파일시스템을 통해 플랫폼 API와 직접 통신하는 최소한의 코딩 에이전트인 StarShell을 제안합니다 . StarShell은 사전 정의된 도구 레지스트리에 의존하지 않고, 문서나 API 응답을 통해 능동적으로 기능을 발견하고 작업을 구성합니다.

#Review #Enterprise Automation #Agentic Systems #Terminal-based Agents #API Interaction #Model Context Protocol (MCP)#Coding Agents

2026년 4월 1일

[논문리뷰] MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

기존 모바일 GUI 에이전트 벤치마크인 AndroidWorld 의 포화 상태(90% 이상의 성공률)와 현실적이지 않은 태스크 복잡성 한계를 극복하는 것을 목표로 합니다.

#Review #Mobile Agents #GUI Benchmarking #Agent-User Interaction #Tool-Augmented Agents #Model Context Protocol (MCP)#Long-Horizon Tasks #Reproducible Evaluation #Android Environment

2025년 12월 22일

[논문리뷰] MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

본 논문은 Model Context Protocol (MCP)을 통해 도구를 사용하는 언어 에이전트의 실제 성능을 정확하게 평가할 수 있는 표준화된 벤치마크의 부재 문제를 해결하고자 합니다.

#Review #Language Agents #Tool Use #Benchmarks #Model Context Protocol (MCP)#LLM Evaluation #Agentic AI #Real-World Performance

2025년 9월 15일

[논문리뷰] Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

본 논문은 정적인 연구 논문이 가진 기술적 장벽으로 인해 코드 및 방법론의 활용과 확산이 어려운 문제를 해결하고자 합니다. 연구는 논문을 상호작용적이고 신뢰할 수 있는 AI 에이전트 로 변환하여 연구 결과의 다운스트림 활용, 채택, 그리고 발견을 가속화하는 새로운 패러다임을 제시하는 것을 목표로 합니다.

#Review #AI Agents #Research Reproducibility #Scientific Communication #Model Context Protocol (MCP)#Natural Language Interaction #Genomics #Single-Cell Analysis #Spatial Transcriptomics

2025년 9월 9일

[논문리뷰] MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

이 논문은 기존 도구 사용 벤치마크의 한계를 극복하고, LLM 에이전트 가 실제와 같은 복잡한 다단계 작업을 수행할 수 있도록 평가하는 대규모 벤치마크인 MCP-Bench 를 소개합니다. 특히 퍼지 지침 하에서의 도구 검색, 교차 도구 조정 , 정확한 매개변수 제어 , 장기 계획/추론 능력을 평가하는 데 중점을 둡니다.

#Review #LLM Agents #Tool Use #Benchmarking #Model Context Protocol (MCP)#Cross-Domain Orchestration #Fuzzy Instructions #Multi-Step Tasks #Real-World Scenarios

2025년 8월 29일

[논문리뷰] LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

본 논문은 AI 에이전트가 현실 세계와 상호작용하고 복잡한 작업을 해결하는 데 필수적인 도구 호출(tool calling) 기능의 평가에 중점을 둡니다.

#Review #AI Agents #Tool Use #Model Context Protocol (MCP)#Benchmarking #Large Language Models (LLMs)#Real-world Tasks #Evaluation #Error Analysis

2025년 8월 22일

[논문리뷰] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

이 논문은 기존 언어 에이전트 벤치마크가 현실 세계의 다양성, 복잡성 및 장기적인 태스크 실행 능력을 제대로 반영하지 못하는 한계를 해결하고자 합니다.

#Review #Language Agents #Tool Use #Benchmarking #Long-Horizon Tasks #Realistic Environments #Multi-Application #Execution-Based Evaluation #Model Context Protocol (MCP)

2025년 10월 30일

[논문리뷰] OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

기존 GUI agent 벤치마크들이 Model Context Protocol (MCP)을 통한 도구 호출(tool invocation) 능력을 간과하여 GUI 상호작용만 평가하는 한계를 극복하고자 합니다.

#Review #Multimodal Agents #Tool Invocation #Benchmark #Model Context Protocol (MCP)#GUI Automation #Computer-Use Agents #Evaluation Metrics

2025년 10월 29일