#Directional Updates

1개의 포스트

[논문리뷰] On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Large Language Models (LLMs)의 reasoning capability는 Reinforcement Learning with Verifiable Rewards (RLVR)와 같은 기법을 통해 크게 발전했습니다.

#Review #RLVR #LLM Reasoning #Log Probability Difference #Directional Updates #Test-Time Extrapolation #Advantage Reweighting #Sparse Updates

2026년 3월 23일