#Self-Imitation Learning

1개의 포스트

[논문리뷰] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

본 논문의 핵심 목표는 장기적인(long-horizon), 희소한 보상(sparsely-rewarded)을 가진 LLM 에이전트 태스크에서 강화 학습(RL)의 근본적인 문제인 탐색-활용 트레이드오프(exploration-exploitation trade-off) 를 효과적으로 관리하는 것입니다.

#Review #Reinforcement Learning #LLM Agents #Exploration-Exploitation #Self-Imitation Learning #Intrinsic Rewards #Curriculum Learning #Policy Entropy #Tool Use

2025년 9월 29일