#Model Growth

1개의 포스트

[논문리뷰] Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training

본 논문은 대규모 언어 모델(LLM) 사전 훈련의 급증하는 계산 비용 문제를 해결하기 위해, 기존의 사전 훈련된 체크포인트에 투자된 '매몰 비용(sunk cost)'을 효율적으로 재활용하여 모델을 성장시키는 방법을 제안합니다.

#Review #Mixture-of-Experts #Large Language Models #Checkpoint Recycling #Model Growth #Efficient Pretraining #Depth Growth #Width Growth #Sunk Cost

2025년 10월 10일