#Prefix Failure

1개의 포스트

[논문리뷰] Trajectory-Refined Distillation

본 논문은 현대 LLM의 후행 학습에서 널리 사용되는 OPD가 구조적으로 직면한 Prefix Failure 문제를 해결하고자 합니다. 기존 연구들은 토큰 단위의 손실 함수 수정이나 특정 토큰의 가중치 조정을 통해 이 문제를 해결하려 했으나, 이는 실패한 궤적의 근본 원인을 수정하지 못하는 한계가 있었습니다 .

#Review #On-policy Distillation #Prefix Failure #Trajectory-Refined Distillation #Large Language Models #Self-distillation #Policy Gradient #Alignment

2026년 6월 8일