본문으로 건너뛰기

#Group Relative Policy Optimization

16개의 포스트

[논문리뷰] Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

댓글 수 로딩 중

[논문리뷰] Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities

댓글 수 로딩 중

[논문리뷰] E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

댓글 수 로딩 중

[논문리뷰] SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

댓글 수 로딩 중

[논문리뷰] VisPlay: Self-Evolving Vision-Language Models from Images

댓글 수 로딩 중

[논문리뷰] Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

댓글 수 로딩 중

[논문리뷰] PairUni: Pairwise Training for Unified Multimodal Language Models

댓글 수 로딩 중

[논문리뷰] Training-Free Group Relative Policy Optimization

댓글 수 로딩 중