#Sample Reweighting

1개의 포스트

[논문리뷰] ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

본 논문은 표준 OPD 및 OPSD가 모든 SGO를 균등하게 취급하여 효율적인 학습 기회를 놓치고 있다는 점을 문제로 지적합니다.

#Review #On-Policy Distillation #Language Model Post-training #Sample Reweighting #Negative Trajectory #Reasoning #Knowledge Distillation #Prefix-based Training

2026년 6월 24일