[논문리뷰] Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning본 논문은 LLM reasoning을 위한 On-Policy Self-Distillation (OPSD)에서 teacher-side exposure mismatch라는 간과된 bottleneck을 식별하고 해결하고자 합니다.#Review#Self-Distillation#LLM Reasoning#Teacher Exposure#On-Policy#Adaptive Control#Reinforcement Learning#Beta-policy2026년 5월 14일댓글 수 로딩 중