[논문리뷰] Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVRYing Nian Wu이 arXiv에 게시한 'Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Self-Play#Variational Problem Synthesis#Policy Entropy#Pass@k#Reasoning Benchmarks2025년 8월 25일댓글 수 로딩 중