#Turning Points

1개의 포스트

[논문리뷰] Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

본 논문은 텍스트-투-이미지 생성에 Flow Matching 모델과 Group Relative Policy Optimization (GRPO)을 적용할 때 발생하는 희소한 보상(sparse rewards) 문제를 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Flow Matching #Text-to-Image Generation #Sparse Rewards #Credit Assignment #Turning Points #Group Relative Policy Optimization

2026년 2월 9일