#Agentic RL

4개의 포스트

[논문리뷰] PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

본 논문은 long-horizon agentic task에서 발생하는 sparse reward로 인한 credit assignment의 근본적인 한계를 해결하고자 한다.

#Review #Reinforcement Learning #Long-Horizon Credit Assignment #Bayesian Inference #Self-Distillation #Search Agents #Agentic RL

2026년 6월 8일

[논문리뷰] RAGEN-2: Reasoning Collapse in Agentic RL

본 논문은 Mutual Information (MI) 기반의 진단 프레임워크와 SNR-Aware Filtering 기법을 제안한다. 연구진은 추론 품질을 Within-input diversity (Entropy)와 Cross-input distinguishability (MI)로 분해하여, 학습 과정에서 MI 프록시를 통해 템플릿 붕괴를 조기에 탐지한다 .

#Review #Agentic RL #Reasoning Collapse #Mutual Information #Signal-to-Noise Ratio #Reward Variance #Template Collapse

2026년 4월 8일

[논문리뷰] AT^2PO: Agentic Turn-based Policy Optimization via Tree Search

본 논문은 LLM 에이전트의 다중 턴(multi-turn) 작업에서 발생하는 세 가지 핵심 문제를 해결하고자 합니다.

#Review #Agentic RL #Multi-turn Tasks #Policy Optimization #Tree Search #Credit Assignment #Exploration Diversity #LLM Agents

2026년 1월 8일

[논문리뷰] SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

본 논문은 복잡한 GUI 태스크에서 자율 에이전트 개발을 위한 에이전트 강화 학습( Agentic RL )의 주요 병목인 태스크 완료 검증의 비효율성과 신뢰성 문제 를 해결하고자 합니다.

#Review #Agentic RL #Self-Verifying Agents #GUI Automation #Evidence Curation #LLM-as-a-Judge #Reward Shaping #AndroidLab

2025년 12월 29일