본문으로 건너뛰기

#Long-Horizon Tasks

25개의 포스트

[논문리뷰] Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

댓글 수 로딩 중

[논문리뷰] SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

댓글 수 로딩 중

[논문리뷰] CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

댓글 수 로딩 중

[논문리뷰] AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

댓글 수 로딩 중

[논문리뷰] Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

댓글 수 로딩 중

[논문리뷰] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

댓글 수 로딩 중

[논문리뷰] χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

댓글 수 로딩 중

[논문리뷰] OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

댓글 수 로딩 중

[논문리뷰] FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

댓글 수 로딩 중

[논문리뷰] Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

댓글 수 로딩 중

[논문리뷰] MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

댓글 수 로딩 중

[논문리뷰] GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

댓글 수 로딩 중

[논문리뷰] ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use

댓글 수 로딩 중

[논문리뷰] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

댓글 수 로딩 중

[논문리뷰] WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

댓글 수 로딩 중

[논문리뷰] The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

댓글 수 로딩 중

[논문리뷰] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

댓글 수 로딩 중

[논문리뷰] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

댓글 수 로딩 중

[논문리뷰] Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

댓글 수 로딩 중

[논문리뷰] A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

댓글 수 로딩 중