#Reverse-KL

1개의 포스트

[논문리뷰] The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

본 연구는 OPD와 OPSD가 시스템 프롬프트 및 지식 내재화에는 효과적이나, 최근 연구들에서 보고된 학습 불안정성(instability) 및 성능 저하(degradation) 문제를 근본적으로 규명하고자 합니다.

#Review #On-Policy Distillation #Self-Distillation #Language Models #Reverse-KL #Privileged Information #Optimization Stability #RLVR

2026년 5월 12일