본문으로 건너뛰기

#GRPO

65개의 포스트

[논문리뷰] Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

댓글 수 로딩 중

[논문리뷰] Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Self-Distilled Agentic Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

댓글 수 로딩 중

[논문리뷰] MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

댓글 수 로딩 중

[논문리뷰] F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

댓글 수 로딩 중

[논문리뷰] UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

댓글 수 로딩 중

[논문리뷰] Can Large Language Models Reinvent Foundational Algorithms?

댓글 수 로딩 중

[논문리뷰] Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

댓글 수 로딩 중

[논문리뷰] FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

댓글 수 로딩 중

[논문리뷰] From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

댓글 수 로딩 중

[논문리뷰] RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

댓글 수 로딩 중

[논문리뷰] Agentic Critical Training

댓글 수 로딩 중

[논문리뷰] Specificity-aware reinforcement learning for fine-grained open-world classification

댓글 수 로딩 중

[논문리뷰] On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

댓글 수 로딩 중

[논문리뷰] Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

댓글 수 로딩 중

[논문리뷰] Self-Hinting Language Models Enhance Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

댓글 수 로딩 중

[논문리뷰] The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

댓글 수 로딩 중

[논문리뷰] Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization

댓글 수 로딩 중

[논문리뷰] VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

댓글 수 로딩 중

[논문리뷰] Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

댓글 수 로딩 중

[논문리뷰] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

댓글 수 로딩 중

[논문리뷰] TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

댓글 수 로딩 중

[논문리뷰] On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

댓글 수 로딩 중

[논문리뷰] TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

댓글 수 로딩 중

[논문리뷰] SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

댓글 수 로딩 중

[논문리뷰] Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

댓글 수 로딩 중

[논문리뷰] Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

댓글 수 로딩 중

[논문리뷰] Reinforced Visual Perception with Tools

댓글 수 로딩 중

[논문리뷰] Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] TempFlow-GRPO: When Timing Matters for GRPO in Flow Models

댓글 수 로딩 중

[논문리뷰] UI-Venus Technical Report: Building High-performance UI Agents with RFT

댓글 수 로딩 중

[논문리뷰] ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

댓글 수 로딩 중

[논문리뷰] Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

댓글 수 로딩 중

[논문리뷰] RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

댓글 수 로딩 중

[논문리뷰] Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

댓글 수 로딩 중