본문으로 건너뛰기

#On-Policy RL

3개의 포스트

[논문리뷰] Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

댓글 수 로딩 중

[논문리뷰] On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

댓글 수 로딩 중