#Rollout Size

1개의 포스트

[논문리뷰] BroRL: Scaling Reinforcement Learning via Broadened Exploration

이 논문은 대규모 언어 모델(LLM)의 복잡한 추론 능력을 향상시키기 위한 Verifiable Rewards (RLVR) 기반 강화 학습(RL)의 스케일링 한계를 극복하는 것을 목표로 합니다.

#Review #Reinforcement Learning #LLMs #Scaling Laws #Exploration #Rollout Size #Verifiable Rewards #PPO #Mass Balance Equation

2025년 10월 2일