#Beta-policy

1개의 포스트

[논문리뷰] Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

본 논문은 LLM reasoning을 위한 On-Policy Self-Distillation (OPSD)에서 teacher-side exposure mismatch라는 간과된 bottleneck을 식별하고 해결하고자 합니다.

#Review #Self-Distillation #LLM Reasoning #Teacher Exposure #On-Policy #Adaptive Control #Reinforcement Learning #Beta-policy

2026년 5월 14일