#Agentic Systems

20개의 포스트

[논문리뷰] Tracing Agentic Failure from the Flow of Success

본 논문은 LLM 기반 에이전트 시스템의 실패를 자동으로 진단할 때 발생하는 비용과 비효율성 문제를 해결하기 위해 Oat를 제안한다.

#Review #LLM Agents #Failure Attribution #Unsupervised Learning #Neural CDE #One-Class Learning #Anomaly Detection #Agentic Systems

2026년 7월 15일

[논문리뷰] Self-Improvements in Modern Agentic Systems: A Survey

본 논문은 현대의 Agentic Systems가 어떻게 인간의 개입을 최소화하면서 경험을 통해 스스로 역량을 확장할 수 있는지에 대한 체계적인 분석을 제공합니다. 기존 연구들은 개별적인 개선 기법에 집중해왔으나, 이러한 기술들을 포괄하는 통합된 프레임워크가 부족했습니다.

#Review #Agentic Systems #Self-Improvement #Foundation Model #Scaffolding #Meta-Learning #Autonomous Agents

2026년 7월 15일

[논문리뷰] EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

본 논문은 현대의 자율 에이전트가 단순히 정적인 출력을 생성하는 것을 넘어, 환경 피드백을 통해 실행 가능한 정책을 반복적으로 개선해야 하는 도전 과제를 다룹니다. 기존 벤치마크는 최종 점수만을 평가하거나 복잡한 엔지니어링 작업과 섞여 있어, 에이전트의 '정책 진화' 능력을 독립적으로 측정하기 어렵다는 한계가 있습니다.

#Review #Autonomous Policy Evolution #Interactive Environments #Benchmark #Agentic Systems #Policy Optimization #Trajectory Analysis

2026년 7월 2일

[논문리뷰] Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

본 연구는 기존 에이전트 벤치마크들이 지나치게 단순한 작업이나 친숙한 웹 환경에만 치중하여 현대 에이전트의 잠재적 한계를 적절히 탐지하지 못한다는 문제의식에서 출발한다. 기존 벤치마크는 주로 온라인 쇼핑이나 단순 정보 검색과 같은 소비자 중심의 작업을 대상으로 하므로, 에이전트의 성능이 조기에 포화되는 현상을 보인다.

#Review #Agentic Systems #GauntletBench #Temporal Perception #Graphical Understanding #3D Reasoning #Generalization #Multimodal Large Language Models

2026년 6월 25일

[논문리뷰] From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI

본 논문은 LLM이 단순히 텍스트를 생성하는 챗봇에서 벗어나, 디지털 환경에서 자율적으로 업무를 수행하는 Digital Colleague로 진화하는 패러다임 전환 과정을 체계적으로 분석합니다.

#Review #Large Language Models #Autonomous AI #Digital Colleague #Workspace + Skill #Task Closure #Agentic Systems #Inference-time Computation

2026년 6월 14일

[논문리뷰] DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

본 논문은 복잡한 워크플로우를 자동화하는 AI 에이전트의 보안 위협을 체계적으로 평가할 수 있는 표준화된 플랫폼과 벤치마크의 부재 문제를 해결합니다.

#Review #AI Agents #Red-Teaming #Safety Evaluation #Agentic Systems #Security Risk Assessment

2026년 5월 10일

[논문리뷰] Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

본 논문은 HDPO 프레임워크를 제안하여 태스크 정확도와 도구 효율성을 orthogonal하게 최적화합니다. 제안 방법론은 accuracy 채널과 efficiency 채널을 명확히 분리하며, efficiency 채널은 오직 정확한 결과를 도출한 경로(trajectory) 내에서만 도구 사용을 최소화하도록 조건부 advantage를 계산합니다 .

#Review #Multimodal Large Language Models #Agentic Systems #Reinforcement Learning #Hierarchical Decoupled Policy Optimization #Meta-Cognitive Tool Use #Efficiency #Reasoning

2026년 4월 9일

[논문리뷰] Terminal Agents Suffice for Enterprise Automation

저자들은 터미널과 파일시스템을 통해 플랫폼 API와 직접 통신하는 최소한의 코딩 에이전트인 StarShell을 제안합니다 . StarShell은 사전 정의된 도구 레지스트리에 의존하지 않고, 문서나 API 응답을 통해 능동적으로 기능을 발견하고 작업을 구성합니다.

#Review #Enterprise Automation #Agentic Systems #Terminal-based Agents #API Interaction #Model Context Protocol (MCP)#Coding Agents

2026년 4월 1일

[논문리뷰] MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

본 논문은 기존 deep research 시스템 평가가 실사용자의 복잡한 요구사항을 충분히 반영하지 못하는 한계점을 해결하기 위해 MiroEval 을 제안한다. 기존 연구들은 주로 정적인 과제를 사용하고 최종 보고서의 품질만을 평가하며, 연구 수행 과정(process)에 대한 감사가 결여되어 있다는 한계가 있다.

#Review #Deep Research #Multimodal Benchmark #Process-Centric Evaluation #Factuality Verification #Agentic Systems #Adaptive Synthesis

2026년 4월 1일

[논문리뷰] DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

기존 발표 자료 생성 에이전트의 한계(미리 정의된 워크플로, 콘텐츠에 구애받지 않는 템플릿, 내부 신호에만 의존하는 자기 성찰)를 극복하고자 합니다.

#Review #Agentic Systems #Presentation Generation #Large Language Models (LLMs)#Multimodal LLMs (MLLMs)#Environment-Grounded Reflection #Self-Correction #Dual-Agent Framework #Supervised Fine-tuning

2026년 3월 8일

[논문리뷰] Agents of Chaos

이 논문은 영구 메모리, 이메일, Discord 접근 권한, 파일 시스템, 쉘 실행 능력을 갖춘 라이브 환경에 배포된 자율형 언어 모델 기반 에이전트 에 대한 탐색적 레드팀 연구를 보고합니다.

#Review #AI Agents #Red-teaming #Agentic Systems #Multi-Agent Communication #Security Vulnerabilities #Prompt Injection #Social Engineering #Resource Management

2026년 2월 23일

[논문리뷰] AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

본 논문은 기존 언어 모델 기반 심층 연구 보고서 생성 시스템들이 겪는 한계를 극복하는 것을 목표로 합니다. 특히, 정적 계획에 의존하여 통찰력에 제한이 있고, 배포 및 데이터 보안 문제로 인해 대규모의 독점 모델에 의존하는 경향을 해소하고자 합니다.

#Review #Deep Research #Agentic Systems #Writing As Reasoning Policy (WARP)#Outline Generation #Iterative Refinement #Reinforcement Learning (RL)#Small Language Models

2026년 2월 9일

[논문리뷰] Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

기존 에이전트 기반 형식 증명 시스템의 유연성, 재현성, 확장성 한계를 해결하고자 합니다.

#Review #Agentic Systems #Formal Theorem Proving #Large Language Models (LLMs)#Lean Theorem Prover #Multi-Agent Systems #Code Generation #Automated Reasoning #Human-AI Collaboration

2026년 1월 21일

[논문리뷰] Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts

본 논문은 최신 추론형 LLM(Large Language Models)이 최소한의 코드 스캐폴딩과 기본적인 도구를 사용하여 연구 아이디어 구상부터 최종 연구 논문 작성까지 높은 자율성 을 가지고 수행할 수 있는지 탐구하는 것을 목표로 합니다.

#Review #Machine Learning Research #Autonomous Research #LLM Agents #Scientific Workflow #Failure Modes #Experimental Design #AI Scientist #Agentic Systems

2026년 1월 7일

[논문리뷰] An Information Theoretic Perspective on Agentic System Design

논문은 에이전트형 언어 모델(LM) 시스템, 특히 컴프레서-프레딕터(compressor-predictor) 아키텍처 의 설계에 대한 체계적인 이해 부족을 해결하고자 합니다.

#Review #Agentic Systems #Language Models #Mutual Information #Rate-Distortion Theory #Compute Efficiency #Scaling Laws #Compressor-Predictor Architecture #On-device AI

2025년 12월 29일

[논문리뷰] Single-stream Policy Optimization

본 논문은 LLM을 위한 기존 그룹 기반 정책 최적화 방식( GRPO 등)이 겪는 비효율성(퇴화 그룹으로 인한 학습 신호 손실)과 동기화 장벽으로 인한 확장성 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #LLM Optimization #Policy Gradient #Variance Reduction #Adaptive Sampling #Scalability #Agentic Systems #RLVR

2025년 9월 17일

[논문리뷰] Universal Deep Research: Bring Your Own Model and Strategy

이 논문은 기존의 심층 연구 도구(DRT)들이 고정된 연구 전략과 제한적인 모델 선택으로 인해 사용자 정의가 어렵고 특정 산업에 특화된 연구 전략을 구축하기 어렵다는 문제를 제기합니다.

#Review #Agentic Systems #Language Models (LLMs)#Research Automation #Customizable Strategies #Code Generation #Deep Research #User-Defined Agents #Sandboxed Execution

2025년 9월 3일

[논문리뷰] Soft Instruction De-escalation Defense

본 논문은 외부 환경과 상호작용하는 LLM 기반 에이전트 시스템 이 겪는 프롬프트 인젝션 공격에 대한 취약성을 해결하는 것을 목표로 합니다. 특히, 신뢰할 수 없는 데이터 내의 악의적인 명령을 효과적으로 무력화하면서도 에이전트의 유용성을 저해하지 않는 방어 메커니즘을 제안합니다.

#Review #Prompt Injection #LLM Security #Agentic Systems #Iterative Sanitization #Instruction Control #Adversarial Robustness #Large Language Models

2025년 10월 27일

[논문리뷰] RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems

본 연구는 대규모 언어 모델(LLM) 기반 에이전트형 검색 증강 생성(RAG) 시스템의 한계, 특히 복잡한 다단계 질문 처리 능력 및 중간 추론 능력 부족 문제를 해결하고자 합니다.

#Review #Large Language Models #Retrieval Augmented Generation #Agentic Systems #Benchmarking #Intermediate Tasks #Error Analysis #LLM Evaluation

2025년 10월 17일

[논문리뷰] In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

이 논문은 기존의 도구 증강 LLM 접근 방식이 긴 추론 과정과 다양한 도구 사용에서 확장성이 떨어지고 새로운 시나리오에 대한 일반화 능력이 약하다는 문제를 제기합니다.

#Review #Agentic Systems #Large Language Models (LLMs)#Tool Use #Reinforcement Learning (RL)#On-policy Optimization #Flow-based Group Refined Policy Optimization (Flow-GRPO)#Multi-turn Reasoning

2025년 10월 8일