[논문리뷰] Learning beyond Teacher: Generalized On-Policy Distillation with Reward ExtrapolationarXiv에 게시된 'Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation' 논문에 대한 자세한 리뷰입니다.#Review#On-Policy Distillation#Reward Extrapolation#Large Language Models (LLMs)#Knowledge Distillation#Reinforcement Learning#Math Reasoning#Code Generation#Multi-teacher Distillation2026년 2월 12일댓글 수 로딩 중