#Deep Learning Optimization

4개의 포스트

[sglang] Intel GPU 가속을 위한 SGLang MoE 커널 최적화: GPT-OSS bf16 지원 분석

Intel XPU 환경에서 GPT-OSS 모델의 MoE 연산 효율을 극대화하기 위한 fused_experts 커널 파라미터 최적화 기법을 살펴봅니다.

#SGLang #Intel GPU #XPU #MoE #GPT-OSS #Deep Learning Optimization

2026년 4월 13일

[논문리뷰] SageBwd: A Trainable Low-bit Attention

arXiv에 게시된 'SageBwd: A Trainable Low-bit Attention' 논문에 대한 자세한 리뷰입니다.

#Review #Low-bit Attention #Quantization #Model Training #Pre-training #Backward Pass #QK-norm #SageBwd #Deep Learning Optimization

2026년 3월 5일

[논문리뷰] MARS-M: When Variance Reduction Meets Matrices

arXiv에 게시된 'MARS-M: When Variance Reduction Meets Matrices' 논문에 대한 자세한 리뷰입니다.

#Review #Variance Reduction #Matrix-based Optimizer #LLM Training #Deep Learning Optimization #Moonlight #MARS-M #Stochastic Gradient Descent

2025년 10월 28일

[논문리뷰] Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

arXiv에 게시된 'Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention' 논문에 대한 자세한 리뷰입니다.

#Review #Low-Precision Training #Flash Attention #Transformer #Numerical Stability #BF16 #Rounding Error #Gradient Bias #Deep Learning Optimization

2025년 10월 9일