#Intrinsic Rewards

2개의 포스트

[논문리뷰] How Far Can Unsupervised RLVR Scale LLM Training?

본 논문은 ground truth 레이블 없이 보상을 얻는 Unsupervised Reinforcement Learning with Verifiable Rewards (URLVR) 가 대규모 언어 모델(LLM) 학습을 얼마나 확장할 수 있는지 종합적으로 분석하는 것을 목표로 합니다.

#Review #Unsupervised Reinforcement Learning #LLM Training #Intrinsic Rewards #External Rewards #Model Collapse #RLVR #Model Prior #Self-Verification

2026년 3월 9일

[논문리뷰] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

본 논문의 핵심 목표는 장기적인(long-horizon), 희소한 보상(sparsely-rewarded)을 가진 LLM 에이전트 태스크에서 강화 학습(RL)의 근본적인 문제인 탐색-활용 트레이드오프(exploration-exploitation trade-off) 를 효과적으로 관리하는 것입니다.

#Review #Reinforcement Learning #LLM Agents #Exploration-Exploitation #Self-Imitation Learning #Intrinsic Rewards #Curriculum Learning #Policy Entropy #Tool Use

2025년 9월 29일