#Strategy Nudging

1개의 포스트

[논문리뷰] Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

본 논문은 RLVR 환경에서 고질적인 문제인 탐색의 병목 현상을 해결하고자 합니다. 기존 방식은 탐색 효율을 높이기 위해 샘플링 횟수(Rollout)를 무작정 늘리는 방식을 취하지만, 이는 계산 비용이 극심하고 long-tail에 위치한 희귀한 정답 추론 경로를 발견하는 데 한계가 있습니다 .

#Review #RLVR #Reinforcement Learning #Exploration #LLM Reasoning #Strategy Nudging #Inter-Intra Group Advantage #Distillation

2026년 5월 17일