#Agentic Models

5개의 포스트

[논문리뷰] Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

본 논문은 기존의 LLM 에이전트 평가 방식이 정적이고 단기적인 작업에 치중되어 있어, 실제 프로덕션 환경에서 요구되는 복잡한 장기 워크플로우를 반영하지 못하는 문제를 해결하고자 합니다.

#Review #Agentic Models #Runtime Assessment #Software Engineering #Long-horizon Workloads #Compiler Construction #Resurrection Protocol #Production Systems

2026년 6월 3일

[논문리뷰] VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

최근 MLLMs는 External Tools와의 통합을 통해 Agentic Problem Solvers로 발전하고 있으나, 복잡한 Visual Tasks를 위해 다양한 도구를 정확하게 실행하고 효과적으로 조합하는 데 지속적인 병목 현상(persistent bottleneck)을 겪고 있습니다.

#Review #Multimodal Large Language Models #Visual Tool Chaining #Agentic Models #Benchmark #OpenCV #Compositional Reasoning #Tool-use Evaluation

2026년 3월 19일

[논문리뷰] Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

본 논문은 LLM이 수동적 응답자에서 자율 에이전트로 발전 하는 데 필요한 확장 가능한 고품질 상호작용 신호 인프라의 부족 문제를 해결하고자 합니다.

#Review #Agentic Models #Large Language Models (LLMs)#Agentic Scaling #Environment Construction #NexAU #NexA4A #NexGAP #Interactive Environments

2025년 12월 4일

[논문리뷰] Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

기존 멀티모달 에이전트 시스템의 한계, 즉 이미지 조작과 웹 검색의 분리, 값비싼 강화 학습(RL) 의존성, 실제 도구 실행과 괴리된 계획 수립 문제를 해결하는 것을 목표로 합니다.

#Review #Multimodal AI #Agentic Models #Interleaved Reasoning #Image Manipulation #DeepSearch #Supervised Fine-tuning (SFT)#Tool-Augmented LLM

2025년 12월 2일

[논문리뷰] GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

본 연구는 기존 에이전트 시각 추론 모델들이 주로 이미지 조작 도구에 집중하여 일반적인 목적으로 확장하기 어려운 한계를 해결하고자 합니다.

#Review #Geolocalization #Agentic Models #Visual Reasoning #Web-Augmented #Multimodal LLMs #Reinforcement Learning #Tool Use #GeoBench

2025년 11월 23일