[논문리뷰] Self-Distilled RLVRNaibin Gu이 arXiv에 게시한 'Self-Distilled RLVR' 논문에 대한 자세한 리뷰입니다.#Review#LLM Post-training#Reinforcement Learning#Self-Distillation#Information Asymmetry#Credit Assignment#RLVR2026년 4월 5일댓글 수 로딩 중
[논문리뷰] FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy OptimizationarXiv에 게시된 'FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Future-KL#Policy Optimization#GRPO#Chain-of-Thought#Credit Assignment2026년 3월 31일댓글 수 로딩 중
[논문리뷰] UI-Voyager: A Self-Evolving GUI Agent Learning via Failed ExperienceYiming Gao이 arXiv에 게시한 'UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience' 논문에 대한 자세한 리뷰입니다.#Review#GUI Agent#Self-Evolving Learning#Rejection Fine-Tuning (RFT)#Group Relative Self-Distillation (GRSD)#Credit Assignment#Sparse Rewards#Mobile Automation#Multimodal Large Language Models (MLLMs)2026년 3월 25일댓글 수 로딩 중
[논문리뷰] Hindsight Credit Assignment for Long-Horizon LLM AgentsYi Wen이 arXiv에 게시한 'Hindsight Credit Assignment for Long-Horizon LLM Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Credit Assignment#Hindsight Credit Assignment#Policy Optimization#Sparse Rewards#Long-Horizon Tasks#Generative Verification2026년 3월 11일댓글 수 로딩 중
[논문리뷰] InfoPO: Information-Driven Policy Optimization for User-Centric AgentsYuyu Luo이 arXiv에 게시한 'InfoPO: Information-Driven Policy Optimization for User-Centric Agents' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Policy Optimization#Information Gain#Credit Assignment#Multi-turn Interaction#User-centric Agents#Counterfactual Reasoning2026년 3월 3일댓글 수 로딩 중
[논문리뷰] Blockwise Advantage Estimation for Multi-Objective RL with Verifiable RewardsarXiv에 게시된 'Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLMs#Credit Assignment#Multi-Objective Optimization#Advantage Estimation#Calibration#Structured Generation#Group Relative Policy Optimization2026년 2월 11일댓글 수 로딩 중
[논문리뷰] Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPOarXiv에 게시된 'Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Flow Matching#Text-to-Image Generation#Sparse Rewards#Credit Assignment#Turning Points#Group Relative Policy Optimization2026년 2월 9일댓글 수 로딩 중
[논문리뷰] Reinforcement Learning via Self-DistillationarXiv에 게시된 'Reinforcement Learning via Self-Distillation' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Self-Distillation#Large Language Models (LLMs)#Rich Feedback#Credit Assignment#Policy Optimization#RLHF#Code Generation#Test-Time Training2026년 1월 28일댓글 수 로딩 중
[논문리뷰] MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite MatchingarXiv에 게시된 'MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching' 논문에 대한 자세한 리뷰입니다.#Review#Tool-Integrated Reasoning#LLMs#Reinforcement Learning#Fine-Grained Supervision#Bipartite Matching#Credit Assignment#Advantage Estimation2026년 1월 15일댓글 수 로딩 중
[논문리뷰] Collaborative Multi-Agent Test-Time Reinforcement Learning for ReasoningarXiv에 게시된 'Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Agent Systems#Reinforcement Learning#Test-Time Adaptation#Large Language Models#Collaborative Reasoning#Credit Assignment#Textual Experience#Distribution Shift Robustness2026년 1월 15일댓글 수 로딩 중
[논문리뷰] AT^2PO: Agentic Turn-based Policy Optimization via Tree SearcharXiv에 게시된 'AT^2PO: Agentic Turn-based Policy Optimization via Tree Search' 논문에 대한 자세한 리뷰입니다.#Review#Agentic RL#Multi-turn Tasks#Policy Optimization#Tree Search#Credit Assignment#Exploration Diversity#LLM Agents2026년 1월 8일댓글 수 로딩 중
[논문리뷰] VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive GenerationarXiv에 게시된 'VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation' 논문에 대한 자세한 리뷰입니다.#Review#Visual Autoregressive Models#Reinforcement Learning#Policy Conflicts#GRPO#Text-to-Image Generation#Credit Assignment#Multi-scale Generation2026년 1월 5일댓글 수 로딩 중
[논문리뷰] TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion ModelsWeirui Ye이 arXiv에 게시한 'TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Diffusion Models#Generative Models#Tree Search#Sample Efficiency#Credit Assignment#GRPO#Visual Generative Models2025년 12월 9일댓글 수 로딩 중
[논문리뷰] VIDEOP2R: Video Understanding from Perception to ReasoningarXiv에 게시된 'VIDEOP2R: Video Understanding from Perception to Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Video Understanding#Reinforcement Fine-Tuning (RFT)#Large Video Language Models (LVLMs)#Perception and Reasoning#Chain-of-Thought (CoT)#Process-Aware Learning#Policy Optimization#Credit Assignment2025년 11월 18일댓글 수 로딩 중
[논문리뷰] Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy OptimizationarXiv에 게시된 'Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#LLM Reasoning#Attention Mechanisms#Reinforcement Learning#Credit Assignment#Policy Optimization#Interpretability#Preplan-and-Anchor Rhythm#Generative Models2025년 10월 16일댓글 수 로딩 중
[논문리뷰] Multi-Agent Tool-Integrated Policy OptimizationLidong Bing이 arXiv에 게시한 'Multi-Agent Tool-Integrated Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Agent RL#Tool-Integrated Planning#Large Language Models (LLMs)#Policy Optimization#Credit Assignment#Reinforcement Learning#MATPO2025년 10월 9일댓글 수 로딩 중
[논문리뷰] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM AgentsXintao Wang이 arXiv에 게시한 'Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Policy Gradients#Entropy Modulation#Credit Assignment#Uncertainty#Long-Horizon Tasks#Self-Calibrating Gradient Scaling2025년 9월 12일댓글 수 로딩 중