[논문리뷰] VGGRPO: Towards World-Consistent Video Generation with 4D Latent RewardarXiv에 게시된 'VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward' 논문에 대한 자세한 리뷰입니다.#Review#Video Diffusion Models#Geometric Consistency#Reinforcement Learning#Latent Geometry Model#4D Reconstruction#Group Relative Policy Optimization2026년 3월 31일댓글 수 로딩 중
[논문리뷰] Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty AdaptationarXiv에 게시된 'Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLM Reasoning#Group Relative Policy Optimization#Advantage Estimation#Exploration-Exploitation#Curriculum Learning#Multi-modal LLMs2026년 2월 12일댓글 수 로딩 중
[논문리뷰] Blockwise Advantage Estimation for Multi-Objective RL with Verifiable RewardsarXiv에 게시된 'Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLMs#Credit Assignment#Multi-Objective Optimization#Advantage Estimation#Calibration#Structured Generation#Group Relative Policy Optimization2026년 2월 11일댓글 수 로딩 중
[논문리뷰] Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPOarXiv에 게시된 'Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Flow Matching#Text-to-Image Generation#Sparse Rewards#Credit Assignment#Turning Points#Group Relative Policy Optimization2026년 2월 9일댓글 수 로딩 중
[논문리뷰] Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative ProbabilitiesIvan Oseledets이 arXiv에 게시한 'Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLM Reasoning#Exploration-Exploitation#Group Relative Policy Optimization#Entropy Collapse#Generative Models#Confidence-Aware Rewards2026년 2월 8일댓글 수 로딩 중
[논문리뷰] E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow ModelsarXiv에 게시된 'E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Flow Models#Entropy-aware Sampling#Group Relative Policy Optimization#SDE#Human Preference Alignment#Image Generation2026년 1월 7일댓글 수 로딩 중
[논문리뷰] Thinking with Images via Self-Calling AgentQixiang Ye이 arXiv에 게시한 'Thinking with Images via Self-Calling Agent' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLMs#Self-Calling Chain-of-Thought#Reinforcement Learning#Visual Reasoning#Agentic AI#Tool Calling#Group Relative Policy Optimization2025년 12월 11일댓글 수 로딩 중
[논문리뷰] SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model AlignmentYi Yang이 arXiv에 게시한 'SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment' 논문에 대한 자세한 리뷰입니다.#Review#LLM Alignment#Stable Rank#Intrinsic Reward#Reinforcement Learning#Geometric Properties#Group Relative Policy Optimization#Annotation-Free Alignment2025년 12월 3일댓글 수 로딩 중
[논문리뷰] Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPOarXiv에 게시된 'Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Agent Systems#Reinforcement Learning#LLM Training#Hierarchical Credit Assignment#Trajectory Alignment#Group Relative Policy Optimization#Tool-Augmented Reasoning#Vertical Architecture2025년 11월 24일댓글 수 로딩 중
[논문리뷰] VisPlay: Self-Evolving Vision-Language Models from ImagesarXiv에 게시된 'VisPlay: Self-Evolving Vision-Language Models from Images' 논문에 대한 자세한 리뷰입니다.#Review#Self-Evolving#Vision-Language Models#Reinforcement Learning#Self-Play#Unlabeled Data#Multimodal Reasoning#Group Relative Policy Optimization#Hallucination Mitigation2025년 11월 19일댓글 수 로딩 중
[논문리뷰] Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement LearningarXiv에 게시된 'Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Conversational Recommender Systems#Large Language Models#Reinforcement Learning#Group Relative Policy Optimization#Rank-based Learning#Supervised Fine-tuning#Reward Shaping2025년 11월 9일댓글 수 로딩 중
[논문리뷰] PairUni: Pairwise Training for Unified Multimodal Language ModelsarXiv에 게시된 'PairUni: Pairwise Training for Unified Multimodal Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Unified Vision-Language Models#Reinforcement Learning#Multimodal Alignment#Pairwise Training#Group Relative Policy Optimization#Data Augmentation#Text-to-Image Generation#Visual Reasoning2025년 10월 30일댓글 수 로딩 중
[논문리뷰] Training-Free Group Relative Policy OptimizationarXiv에 게시된 'Training-Free Group Relative Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Parameter-Free Optimization#Experiential Knowledge#Token Prior#Group Relative Policy Optimization#In-Context Learning#Cost-Effective AI2025년 10월 10일댓글 수 로딩 중
[논문리뷰] No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage ShapingarXiv에 게시된 'No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping' 논문에 대한 자세한 리뷰입니다.#Review#LLM Reinforcement Learning#Zero-Variance Prompts#Advantage Shaping#Entropy-Guided#Math Reasoning#RLVR#Group Relative Policy Optimization2025년 9월 29일댓글 수 로딩 중
[논문리뷰] Train Long, Think Short: Curriculum Learning for Efficient ReasoningMarzyeh Ghassemi이 arXiv에 게시한 'Train Long, Think Short: Curriculum Learning for Efficient Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Curriculum Learning#Reinforcement Learning#Large Language Models#Reasoning Efficiency#Token Budget Control#Group Relative Policy Optimization#Chain-of-Thought2025년 8월 13일댓글 수 로딩 중