[논문리뷰] SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative TasksarXiv에 게시된 'SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks' 논문에 대한 자세한 리뷰입니다.#Review#SlopCodeBench#Coding Agents#Iterative Development#Code Quality#Structural Erosion#Verbosity#Benchmarking#Long-Horizon Tasks2026년 3월 26일댓글 수 로딩 중
[논문리뷰] CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in HealthcarearXiv에 게시된 'CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Agent Framework#Healthcare Automation#Long-Horizon Tasks#Actor-Critic#Tool Grounding#Dual-Memory#CareFlow#GUI Agents2026년 3월 25일댓글 수 로딩 중
[논문리뷰] Hindsight Credit Assignment for Long-Horizon LLM AgentsYi Wen이 arXiv에 게시한 'Hindsight Credit Assignment for Long-Horizon LLM Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Credit Assignment#Hindsight Credit Assignment#Policy Optimization#Sparse Rewards#Long-Horizon Tasks#Generative Verification2026년 3월 11일댓글 수 로딩 중
[논문리뷰] AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual ScenariosarXiv에 게시된 'AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Agents#Visual Reasoning#Tool Use#Benchmark#Long-Horizon Tasks#Realistic Scenarios#Agentic Intelligence2026년 3월 5일댓글 수 로딩 중
[논문리뷰] Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience MemoryWei Wei이 arXiv에 게시한 'Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Long-Horizon Tasks#Memory Management#Indexed Experience Memory#Reinforcement Learning#Context Window#Tool Use#MEMEXRL2026년 3월 4일댓글 수 로딩 중
[논문리뷰] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RLarXiv에 게시된 'GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL' 논문에 대한 자세한 리뷰입니다.#Review#GUI Agents#Reinforcement Learning#Supervised Fine-tuning#Visual Grounding#Long-Horizon Tasks#Partial Verifiability#KL Regularization#Data Curation2026년 2월 25일댓글 수 로딩 중
[논문리뷰] χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional InconsistenciesarXiv에 게시된 'χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies' 논문에 대한 자세한 리뷰입니다.#Review#Robotic Manipulation#Distributional Shift#Imitation Learning#Model Arithmetic#Stage Advantage#Train-Deploy Alignment#Resource-Efficient AI#Long-Horizon Tasks2026년 2월 12일댓글 수 로딩 중
[논문리뷰] OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactionsheroding77이 arXiv에 게시한 'OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Benchmarking#Inductive Reasoning#Long-Horizon Tasks#Active Exploration#World Models#Autonomous Discovery2026년 2월 8일댓글 수 로딩 중
[논문리뷰] FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based AgentsarXiv에 게시된 'FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Deep Research#Long-Horizon Tasks#Test-Time Scaling#File System#Persistent Workspace#Knowledge Base#Dual-Agent Framework2026년 2월 2일댓글 수 로딩 중
[논문리뷰] Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic LearningShuai Zhang이 arXiv에 게시한 'Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning' 논문에 대한 자세한 리뷰입니다.#Review#Agentic AI#Reinforcement Learning#Long-Horizon Tasks#Dynamic Branching#Strategic Exploration#LLM Agents#Sample Efficiency#Policy Optimization2026년 1월 28일댓글 수 로딩 중
[논문리뷰] SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution ScenariosNghi D. Q. Bui이 arXiv에 게시한 'SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios' 논문에 대한 자세한 리뷰입니다.#Review#Coding Agents#Software Evolution#Benchmarking#Long-Horizon Tasks#Large Language Models (LLMs)#Software Engineering#Code Generation2025년 12월 24일댓글 수 로딩 중
[논문리뷰] MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented EnvironmentsarXiv에 게시된 'MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments' 논문에 대한 자세한 리뷰입니다.#Review#Mobile Agents#GUI Benchmarking#Agent-User Interaction#Tool-Augmented Agents#Model Context Protocol (MCP)#Long-Horizon Tasks#Reproducible Evaluation#Android Environment2025년 12월 22일댓글 수 로딩 중
[논문리뷰] GR-RL: Going Dexterous and Precise for Long-Horizon Robotic ManipulationarXiv에 게시된 'GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation' 논문에 대한 자세한 리뷰입니다.#Review#Robotic Manipulation#Reinforcement Learning#Vision-Language-Action#Dexterous Control#Long-Horizon Tasks#Data Filtering#Data Augmentation#Foundation Models2025년 12월 1일댓글 수 로딩 중
[논문리뷰] PRInTS: Reward Modeling for Long-Horizon Information SeekingElias Stengel-Eskin이 arXiv에 게시한 'PRInTS: Reward Modeling for Long-Horizon Information Seeking' 논문에 대한 자세한 리뷰입니다.#Review#Reward Modeling#Long-Horizon Tasks#Information Seeking#Large Language Models#Trajectory Summarization#Reinforcement Learning#Tool Use#Process Reward Models2025년 11월 24일댓글 수 로딩 중
[논문리뷰] ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool UseGuanting Dong이 arXiv에 게시한 'ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Agents#Tool-Augmented LLMs#Vision-Guided Reasoning#Long-Horizon Tasks#VQA#Global Planning#Context Preservation#Perceive Tool2025년 11월 9일댓글 수 로딩 중
[논문리뷰] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task ExecutionHaoze Wu이 arXiv에 게시한 'The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution' 논문에 대한 자세한 리뷰입니다.#Review#Language Agents#Tool Use#Benchmarking#Long-Horizon Tasks#Realistic Environments#Multi-Application#Execution-Based Evaluation#Model Context Protocol (MCP)2025년 10월 30일댓글 수 로딩 중
[논문리뷰] AgentFold: Long-Horizon Web Agents with Proactive Context ManagementarXiv에 게시된 'AgentFold: Long-Horizon Web Agents with Proactive Context Management' 논문에 대한 자세한 리뷰입니다.#Review#Web Agents#Context Management#Long-Horizon Tasks#LLM#Deep Consolidation#Granular Condensation#ReAct Paradigm2025년 10월 29일댓글 수 로딩 중
[논문리뷰] Memory as Action: Autonomous Context Curation for Long-Horizon Agentic TasksXueyuan Lin이 arXiv에 게시한 'Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks' 논문에 대한 자세한 리뷰입니다.#Review#Long-Horizon Tasks#Agentic AI#Context Curation#Working Memory#Reinforcement Learning#Policy Optimization#Large Language Models#Memory-as-Action2025년 10월 15일댓글 수 로딩 중
[논문리뷰] A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent TasksFanchao Qi이 arXiv에 게시한 'A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks' 논문에 대한 자세한 리뷰입니다.#Review#Long-Horizon Tasks#LLM Agents#Global Planning#Reinforcement Learning#Supervised Fine-tuning#Homologous Consensus Filtering#Executor Capability Gain Reward#Plan-and-Execute2025년 10월 13일댓글 수 로딩 중
[논문리뷰] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?Yannis Yiming He이 arXiv에 게시한 'SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#Software Engineering#LLMs#Code Generation#Benchmark#Contamination Resistance#Long-Horizon Tasks#Enterprise Software2025년 9월 23일댓글 수 로딩 중
[논문리뷰] WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon AgentsWenbiao Yin이 arXiv에 게시한 'WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents' 논문에 대한 자세한 리뷰입니다.#Review#Agentic AI#Deep Research#Iterative Reasoning#Long-Horizon Tasks#Context Management#Data Synthesis#Tool-Augmented LLMs#Markov Decision Process2025년 9월 17일댓글 수 로딩 중
[논문리뷰] The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMsJonas Geiping이 arXiv에 게시한 'The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Long-Horizon Tasks#Execution Capability#Scaling Laws#Self-Conditioning#Thinking Models#Agentic AI2025년 9월 15일댓글 수 로딩 중
[논문리뷰] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM AgentsXintao Wang이 arXiv에 게시한 'Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Policy Gradients#Entropy Modulation#Credit Assignment#Uncertainty#Long-Horizon Tasks#Self-Calibrating Gradient Scaling2025년 9월 12일댓글 수 로딩 중