#SRPO

1개의 포스트

[논문리뷰] Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

본 논문은 기존 온라인 강화 학습(Online-RL) 기반 확산 모델 정렬 방식의 한계를 극복하는 것을 목표로 합니다.

#Review #Diffusion Models #Reinforcement Learning #Human Preference #Text-to-Image Generation #Reward Hacking #Direct-Align #SRPO #Fine-Grained Control #Flow Matching Models

2025년 9월 10일