#RaggedShard

1개의 포스트

[논문리뷰] veScale-FSDP: Flexible and High-Performance FSDP at Scale

본 논문은 기존 FSDP(Fully Sharded Data Parallel) 시스템이 블록-wise 양자화 훈련 이나 Shampoo, Muon 과 같은 비-요소별(non-element-wise) 옵티마이저 를 사용하는 구조 인식 훈련(structure-aware training) 에서 겪는 한계를 해결하고자 합니다.

#Review #FSDP #Distributed Training #LLM #GPU Scaling #Memory Optimization #Performance Optimization #Structure-Aware Training #RaggedShard

2026년 2월 26일