#Efficient Pretraining

2개의 포스트

[논문리뷰] HRM-Text: Efficient Pretraining Beyond Scaling

본 논문은 대규모 컴퓨팅 자원과 인터넷 규모의 raw text에 의존하는 기존의 Large Language Model(LLM) pretraining 패러다임이 가진 극심한 비효율성을 해결하고자 합니다.

#Review #Hierarchical Recurrent Model #Efficient Pretraining #MagicNorm #Task-completion Objective #PrefixLM #Compute Efficiency

2026년 5월 20일

[논문리뷰] Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training

본 논문은 대규모 언어 모델(LLM) 사전 훈련의 급증하는 계산 비용 문제를 해결하기 위해, 기존의 사전 훈련된 체크포인트에 투자된 '매몰 비용(sunk cost)'을 효율적으로 재활용하여 모델을 성장시키는 방법을 제안합니다.

#Review #Mixture-of-Experts #Large Language Models #Checkpoint Recycling #Model Growth #Efficient Pretraining #Depth Growth #Width Growth #Sunk Cost

2025년 10월 10일