#Muon Optimizer

2개의 포스트

[논문리뷰] Parallax: Parameterized Local Linear Attention for Language Modeling

본 논문은 대규모 언어 모델(LLM) 학습에서 Softmax Attention이 가지는 구조적 한계를 극복하고 효율성을 높이는 것을 목표로 한다.

#Review #Local Linear Attention #Language Modeling #Muon Optimizer #Parameterized Attention #Arithmetic Intensity

2026년 5월 28일

[논문리뷰] Arcee Trinity Large Technical Report

본 논문은 희소한 Mixture-of-Experts (MoE) 아키텍처를 기반으로 하는 대규모 언어 모델인 Trinity Large 를 개발하고, 효율적인 학습 및 추론 성능과 높은 안정성을 달성하는 것을 목표로 합니다.

#Review #Mixture-of-Experts #Sparse LLM #Training Stability #Load Balancing #MoE #Transformer Architecture #Context Extension #Muon Optimizer

2026년 2월 19일