#LLM Training

18개의 포스트

[Axolotl] GRPO 트레이너에 batch flattening/packing 지원 추가

GRPO 강화학습 트레이너의 scoring forward pass에서 padding 토큰을 제거하는 batch flattening 기법으로 20-34% 성능 향상을 달성한 분석.

#Axolotl #GRPO #LLM Training #Performance #Flash Attention #PyTorch #Reinforcement Learning

2026년 3월 28일

[Axolotl] LoRA 커널에 bias, dropout, DoRA, embedding 지원 추가

Axolotl의 Triton LoRA 커널을 확장하여 bias 파라미터, dropout, DoRA(Weight-Decomposed LoRA), embedding 레이어를 지원하도록 개선한 분석.

#Axolotl #LoRA #DoRA #Triton #LLM Training #Performance #PEFT

2026년 3월 22일

[Axolotl] Qwen 3.5 모델 Liger 커널 지원 및 fused RMSNorm+Gated 커널 추가

Axolotl에 Qwen 3.5 / Qwen 3.5 MoE 모델용 Liger FLCE 커널 지원과 fused RMSNorm+SiLU gate Triton 커널을 추가한 분석.

#Axolotl #Liger Kernel #Qwen 3.5 #RMSNorm #Triton #LLM Training #Performance

2026년 3월 22일

[논문리뷰] How Far Can Unsupervised RLVR Scale LLM Training?

본 논문은 ground truth 레이블 없이 보상을 얻는 Unsupervised Reinforcement Learning with Verifiable Rewards (URLVR) 가 대규모 언어 모델(LLM) 학습을 얼마나 확장할 수 있는지 종합적으로 분석하는 것을 목표로 합니다.

#Review #Unsupervised Reinforcement Learning #LLM Training #Intrinsic Rewards #External Rewards #Model Collapse #RLVR #Model Prior #Self-Verification

2026년 3월 9일

[논문리뷰] MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

본 논문은 대규모 언어 모델(LLM)을 활용한 과학적 발견 과정, 특히 P(hypothesis|background)의 직접적인 모델링이 지닌 조합론적 복잡성(O(Nk)) 으로 인한 비실용성을 해결하는 것을 목표로 합니다.

#Review #Scientific Discovery #LLM Training #Combinatorial Complexity #Hierarchical Search #Bounded Composition #Motivation Planning #Tractable Training #TOMATO-STAR Dataset

2026년 3월 5일

[논문리뷰] VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

LLM(Large Language Models)을 위한 오프-정책(off-policy) 강화 학습 훈련 시 발생하는 불안정성 문제, 즉 정책 노후화(policy staleness), 비동기 훈련, 훈련-추론 불일치로 인한 높은 중요도 샘플링(IS) 분산을 해결하는 것을 목표로 합니다.

#Review #Off-Policy RL #LLM Training #Importance Sampling #Variance Reduction #Variational Optimization #Policy Gradient #Sequence-Level Optimization #Reinforcement Learning

2026년 2월 22일

[논문리뷰] ArXiv-to-Model: A Practical Study of Scientific LM Training

본 연구는 raw arXiv LaTeX 소스 를 활용하여 도메인 특화 과학 언어 모델(Scientific LM)을 훈련하는 실제적이고 투명한 과정을 문서화하는 것을 목표로 합니다.

#Review #Scientific Language Models #LLM Training #ArXiv #LaTeX Processing #Tokenization #Resource Constraints #Pretraining #Data Engineering

2026년 2월 19일

[논문리뷰] On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

대규모 언어 모델(LLM) 학습에 주로 사용되는 밀집형 적응적 옵티마이저의 한계에 도전하고, 무작위 업데이트 마스킹이 최적화 성능을 향상시킬 수 있음을 입증하는 것이 목표입니다. 특히, 모멘텀-그래디언트 정렬을 활용하는 새로운 마스킹 기법인 Magma 를 제안하여 LLM 훈련의 안정성과 일반화 성능을 개선하고자 합니다.

#Review #Adaptive Optimizers #Gradient Masking #LLM Training #Geometric Regularization #Momentum Alignment #RMSProp #Perplexity #Deep Learning

2026년 2월 17일

[논문리뷰] Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

본 논문은 Chain-of-Thought (CoT) 데이터를 활용한 지도 미세 조정(SFT) 단계에서 제한된 고품질 데이터 를 가장 효과적으로 활용하는 방법을 탐구합니다.

#Review #Supervised Fine-tuning (SFT)#Chain-of-Thought (CoT)#Data Repetition #Data Scaling #LLM Training #Generalization #Overfitting #Reasoning Models

2026년 2월 11일

[논문리뷰] daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

본 논문은 대규모 언어 모델(LLM)이 단기 작업에서 뛰어난 성능을 보임에도 불구하고, 실제와 같은 복잡한 장기 에이전트 워크플로우로 확장하는 데 필요한 고품질 훈련 데이터 부족 문제를 해결하고자 합니다.

#Review #Long-Horizon Agency #Data Synthesis #Pull Request Chains #Software Evolution #LLM Training #Agentic AI #Self-Distillation #Code Generation

2026년 2월 3일

[논문리뷰] Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

본 논문은 대규모 언어 모델(LLM)의 강화 학습(RL) 훈련 파이프라인에서 발생하는 계산 비효율성, 특히 전체 훈련 시간의 70% 이상을 차지하는 롤아웃(rollout) 단계의 병목 현상을 해결하고자 합니다.

#Review #Reinforcement Learning #FP8 Quantization #LLM Training #On-Policy RL #Unified Precision Flow #Training Efficiency #Rollout Acceleration

2026년 1월 25일

[논문리뷰] Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

본 논문은 대규모 언어 모델(LLM) 기반 멀티 에이전트 시스템이 특정 도메인에서 비일관적인 성능을 보이는 문제를 해결하고자 합니다.

#Review #Multi-Agent Systems #Reinforcement Learning #LLM Training #Hierarchical Credit Assignment #Trajectory Alignment #Group Relative Policy Optimization #Tool-Augmented Reasoning #Vertical Architecture

2025년 11월 24일

[논문리뷰] TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

언어 모델 사전 훈련 과정에서 고정된 데이터 혼합 전략은 모델의 학습 선호도가 동적으로 변화함에 따라 최적의 성능을 달성하지 못합니다. 본 논문은 이러한 진화하는 데이터 선호도를 효율적으로 관찰 하고, 이를 기반으로 데이터 혼합 비율을 동적으로 조정 하여 모델 성능을 극대화하는 것을 목표로 합니다.

#Review #Language Model Pre-training #Dynamic Data Mixing #Data Influence #Group Influence #Optimization #Regression Model #LLM Training

2025년 9월 1일

[논문리뷰] Cyber-Zero: Training Cybersecurity Agents without Runtime

기존 대규모 언어 모델(LLM) 기반 소프트웨어 엔지니어링 에이전트들이 실행 환경을 통해 학습하지만, 사이버 보안 도메인에서는 이러한 실행 환경이 부족하여 고급 훈련 데이터 확보가 어렵습니다.

#Review #Cybersecurity Agents #LLM Training #Trajectory Synthesis #Runtime-Free Training #CTF Challenges #LLM Simulation

2025년 8월 5일

[논문리뷰] MARS-M: When Variance Reduction Meets Matrices

본 논문은 대규모 언어 모델(LLM) 및 딥러닝 모델 훈련의 효율성과 안정성을 향상시키기 위해, 행렬 기반 전처리 옵티마이저 의 장점과 분산 감소(variance reduction) 기법 의 장점을 결합하는 것을 목표로 합니다.

#Review #Variance Reduction #Matrix-based Optimizer #LLM Training #Deep Learning Optimization #Moonlight #MARS-M #Stochastic Gradient Descent

2025년 10월 28일

[논문리뷰] COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

대규모 언어 모델(LLM)이 비영어권, 특히 중국어 창의적 글쓰기에서 겪는 체계적인 결함(예: 예측 가능한 내러티브, 스타일 다양성 부족, 문화적 비정합성)을 해결하는 것을 목표로 합니다.

#Review #Chinese Creative Writing #Process Supervision #LLM Training #Dataset Creation #Cross-Lingual Transfer #Narrative Logic #Linguistic Expression #Type-Token Ratio

2025년 10월 17일

[논문리뷰] Revisiting Long-context Modeling from Context Denoising Perspective

본 연구는 Long-context Models (LCMs)가 컨텍스트 내의 불필요한 토큰(contextual noise)에 취약하여 모델의 어텐션을 잘못 유도하고 성능을 저해하는 문제를 해결하는 것을 목표로 합니다.

#Review #Long-context Models #Context Denoising #Integrated Gradient #LLM Training #Context Window Scaling #Information Flow #Attention Mechanism

2025년 10월 9일

[논문리뷰] NorMuon: Making Muon more efficient and scalable

대규모 언어 모델(LLM) 훈련 효율성 향상을 위해 기존 Muon 옵티마이저의 한계를 극복하는 것이 목표입니다. Muon이 업데이트의 컨디셔닝을 개선하지만 뉴런별 업데이트 노름의 분산이 크다는 문제를 해결하고, 이를 통해 훈련 동역학을 더욱 균형 있게 만들어 전반적인 수렴 속도와 확장성을 높이고자 합니다.

#Review #LLM Training #Optimizer #Muon #Orthogonalization #Adaptive Learning Rates #Distributed Training #FSDP2 #NorMuon

2025년 10월 9일