#Information Asymmetry

1개의 포스트

[논문리뷰] Self-Distilled RLVR

본 논문은 OPSD 가 훈련 초기에는 성능 향상을 보이나, 곧 정보 누출(Information Leakage)로 인해 성능이 저하되는 원인을 규명하고 이를 해결하고자 합니다.

#Review #LLM Post-training #Reinforcement Learning #Self-Distillation #Information Asymmetry #Credit Assignment #RLVR

2026년 4월 5일