[논문리뷰] The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language ModelsarXiv에 게시된 'The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Diffusion Language Models#Reasoning#Reinforcement Learning#Autoregressive Models#Generation Order#Entropy Degradation#Pass@k#GRPO2026년 1월 22일댓글 수 로딩 중
[논문리뷰] Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMsarXiv에 게시된 'Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#Large Language Models (LLMs)#Exploration Collapse#Strategy-level Diversity#Uniqueness-Aware Rewarding#Creative Problem Solving#Pass@k2026년 1월 15일댓글 수 로딩 중
[논문리뷰] The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable RewardXiaoyu Tan이 arXiv에 게시한 'The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models (LLMs)#Diversity Collapse#f-divergence#Forward-KL#JS-divergence#Pass@k#Catastrophic Forgetting2025년 9월 12일댓글 수 로딩 중
[논문리뷰] Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVRYing Nian Wu이 arXiv에 게시한 'Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Self-Play#Variational Problem Synthesis#Policy Entropy#Pass@k#Reasoning Benchmarks2025년 8월 25일댓글 수 로딩 중
[논문리뷰] Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning ModelsQinghao Ye이 arXiv에 게시한 'Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Exploration-Exploitation#Reward Design#Reasoning Tasks#Pass@k#Policy Optimization2025년 8월 15일댓글 수 로딩 중