#Recurrent Dynamics

1개의 포스트

[논문리뷰] Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

본 논문은 하이브리드 모델로의 전환 시 발생하는 부적절한 재귀적 파라미터 초기화 문제를 해결하고자 합니다. 기존 연구들은 Transformer의 가중치를 복사하는 데 집중하지만, 새롭게 도입되는 GDN의 동역학(decay, gate 등)을 고려하지 않아 초기 모델이 최적화되지 않은 상태에서 학습을 시작하게 됩니다 .

#Review #Hybrid Linear Attention #Gated DeltaNet #Model Distillation #Initialization #Softmax Attention #Knowledge Distillation #Recurrent Dynamics

2026년 6월 18일