#Muon

4개의 포스트

[논문리뷰] Why Muon Outperforms Adam: A Curvature Perspective

본 논문은 LLM pretraining에서 Muon이 왜 Adam보다 약 2배 빠른 학습 효율을 보이는지, 그 근본적인 기하학적 이유를 규명하고자 합니다.

#Review #Muon #Adam #Curvature #Normalized Directional Sharpness (NDS)#Large Language Model #Optimization Landscape #Hessian

2026년 6월 8일

[논문리뷰] Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

본 논문은 Muon 옵티마이저가 사전 학습(Pretraining) 단계를 넘어선 하류 태스크(Downstream tasks), 특히 VLA 및 RLVR 환경에서 성능 저하를 보이는 근본적인 이유를 규명합니다.

#Review #Muon #Pretraining #Spectral Analysis #VLA #RLVR #Optimization #Deep Learning

2026년 5월 24일

[논문리뷰] Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

대규모 언어 모델(LLM) 학습 시 Weight Decay(WD) 가 가중치 행렬의 스케일을 '노이즈-WD 평형' 상태에 고정시켜 데이터에 최적화된 스케일 학습을 방해하는 문제를 해결하는 것이 목표입니다.

#Review #Large Language Models #Weight Decay #Learnable Multipliers #Scale Adaptation #Optimization #µP Parametrization #Adam #Muon

2026년 1월 8일

[논문리뷰] NorMuon: Making Muon more efficient and scalable

대규모 언어 모델(LLM) 훈련 효율성 향상을 위해 기존 Muon 옵티마이저의 한계를 극복하는 것이 목표입니다. Muon이 업데이트의 컨디셔닝을 개선하지만 뉴런별 업데이트 노름의 분산이 크다는 문제를 해결하고, 이를 통해 훈련 동역학을 더욱 균형 있게 만들어 전반적인 수렴 속도와 확장성을 높이고자 합니다.

#Review #LLM Training #Optimizer #Muon #Orthogonalization #Adaptive Learning Rates #Distributed Training #FSDP2 #NorMuon

2025년 10월 9일