본문으로 건너뛰기

#Credit Assignment

23개의 포스트

[논문리뷰] From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

댓글 수 로딩 중

[논문리뷰] DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

댓글 수 로딩 중

[논문리뷰] FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

댓글 수 로딩 중

[논문리뷰] UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

댓글 수 로딩 중

[논문리뷰] InfoPO: Information-Driven Policy Optimization for User-Centric Agents

댓글 수 로딩 중

[논문리뷰] Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

댓글 수 로딩 중

[논문리뷰] Reinforcement Learning via Self-Distillation

댓글 수 로딩 중

[논문리뷰] MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

댓글 수 로딩 중

[논문리뷰] VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

댓글 수 로딩 중

[논문리뷰] TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

댓글 수 로딩 중

[논문리뷰] VIDEOP2R: Video Understanding from Perception to Reasoning

댓글 수 로딩 중

[논문리뷰] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

댓글 수 로딩 중

[논문리뷰] Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

댓글 수 로딩 중