본문으로 건너뛰기

#Reward Modeling

45개의 포스트

[논문리뷰] Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

댓글 수 로딩 중

[논문리뷰] C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences

댓글 수 로딩 중

[논문리뷰] Personalizing Text-to-Image Generation to Individual Taste

댓글 수 로딩 중

[논문리뷰] Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

댓글 수 로딩 중

[논문리뷰] Video-Based Reward Modeling for Computer-Use Agents

댓글 수 로딩 중

[논문리뷰] Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

댓글 수 로딩 중

[논문리뷰] Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

댓글 수 로딩 중

[논문리뷰] CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

댓글 수 로딩 중

[논문리뷰] Enhancing Spatial Understanding in Image Generation via Reward Modeling

댓글 수 로딩 중

[논문리뷰] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

댓글 수 로딩 중

[논문리뷰] TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

댓글 수 로딩 중

[논문리뷰] RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

댓글 수 로딩 중

[논문리뷰] PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

댓글 수 로딩 중

[논문리뷰] LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

댓글 수 로딩 중

[논문리뷰] ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing

댓글 수 로딩 중

[논문리뷰] Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

댓글 수 로딩 중

[논문리뷰] EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

댓글 수 로딩 중

[논문리뷰] Language Self-Play For Data-Free Training

댓글 수 로딩 중

[논문리뷰] Improving Large Vision and Language Models by Learning from a Panel of Peers

댓글 수 로딩 중

[논문리뷰] FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

댓글 수 로딩 중

[논문리뷰] FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

댓글 수 로딩 중

[논문리뷰] Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

댓글 수 로딩 중

[논문리뷰] Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

댓글 수 로딩 중

[논문리뷰] Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

댓글 수 로딩 중