#Mixture-of-Experts (MoE)

26개의 포스트

[논문리뷰] The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

본 논문은 large language model (LLM)이 long-horizon agentic workflow로 전환됨에 따라 발생하는 efficiency 및 cost bottleneck 문제와 intrinsically complex, high-stakes task 해결의 어려움을 다룹니다.

#Review #Mixture-of-Experts (MoE)#Mini Activations #Agentic AI #Self-Evolution #Reinforcement Learning (RL)#Multi-Token Prediction (MTP)

2026년 5월 26일

[논문리뷰] Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

본 논문은 LMM의 표준 post-training 파이프라인인 SFT→RLVR에서 발생하는 distributional drift 문제를 해결하고자 한다. 기존의 SFT는 토큰 수준의 uniform objective에 의존하여 모델이 피상적인 패턴만을 학습하게 만들며, 이는 모델의 본래 성능을 왜곡하는 결과를 초래한다.

#Review #Multimodal LLM #Reinforcement Learning #On-Policy Distillation #Distributional Drift #Mixture-of-Experts (MoE)#Adversarial Alignment

2026년 5월 5일

[논문리뷰] CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing

이 논문은 기존의 통합 이미지 편집 모델들이 고정된 공유 백본을 사용함으로써 다중 조건(텍스트, 마스크, 참조 이미지) 입력 시 발생하는 태스크 간섭, 색상 번짐, 정체성/스타일 왜곡 등의 문제를 해결하고자 합니다.

#Review #Image Editing #Diffusion Models #Mixture-of-Experts (MoE)#Condition-Aware Routing #Contextual Image Editing #Mask Repaint #Latent Mixture #Diffusion Transformer

2026년 3월 9일

[논문리뷰] Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

기존 시계열 파운데이션 모델의 확장성 병목 현상 을 해결하고, 시계열 예측의 본질적인 직렬적 특성 을 고려하여 추론 비용을 줄이면서 훨씬 강력한 예측 성능 을 제공하는 빌리언 스케일 모델 을 개발하는 것이 목표입니다. 특히 장기 예측의 정확도를 개선하는 데 중점을 둡니다.

#Review #Time Series Forecasting #Foundation Model #Mixture-of-Experts (MoE)#Serial Scaling #Transformer #Pre-training #Probabilistic Forecasting #Data Augmentation

2026년 3월 5일

[논문리뷰] Qwen3-Coder-Next Technical Report

본 논문은 코딩 에이전트에 특화된 오픈-웨이트 언어 모델인 Qwen3-Coder-Next 를 소개합니다. 800억 개의 총 파라미터 중 추론 시 30억 개만 활성화 되는 MoE(Mixture-of-Experts) 아키텍처를 통해 효율적인 추론과 강력한 코딩 능력을 동시에 달성하는 것을 목표로 합니다.

#Review #Coding Agents #Large Language Models (LLMs)#Mixture-of-Experts (MoE)#Agentic Training #Software Engineering #Reinforcement Learning #Code Generation #Tool Usage

2026년 3월 3일

[논문리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining

본 논문은 기존 언어 모델링의 한계를 넘어, 비전 신호를 퍼스트 클래스 시민 으로 통합한 통합 멀티모달 사전 훈련(unified multimodal pretraining) 의 설계 공간을 탐색하고 경험적 명확성을 제공하는 것을 목표로 합니다.

#Review #Multimodal Pretraining #Vision-Language Models #Mixture-of-Experts (MoE)#Representation Autoencoders (RAE)#World Modeling #Scaling Laws #Diffusion Models #Unified Architectures

2026년 3월 3일

[논문리뷰] Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

대규모 언어 모델(LLM) 사전 학습에 필요한 막대한 GPU 메모리 및 통신 대역폭 요구 사항으로 인한 중앙 집중식 학습의 한계를 극복하는 것입니다.

#Review #Decentralized Training #Mixture-of-Experts (MoE)#Large Language Models (LLMs)#Memory Efficiency #Sparse Expert Synchronization #Federated Learning #Distributed GPUs

2026년 2월 12일

[논문리뷰] Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

본 논문은 11B 활성화 파라미터 를 가진 196B Mixture-of-Experts (MoE) 모델 인 Step 3.5 Flash 를 소개하며, 첨단 에이전트 지능과 컴퓨팅 효율성 간의 격차를 해소하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Sparse Models #Inference Efficiency #Hybrid Attention #Multi-Token Prediction (MTP)#Reinforcement Learning (RL)#Agentic AI #Long-Context Understanding

2026년 2월 11일

[논문리뷰] OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

본 논문은 MoE 아키텍처에서 전문가 전문화의 세분성과 하드웨어 실행 효율성 사이의 본질적인 trade-off를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Fine-Grained Experts #Efficient Architectures #Transformer #Routing Algorithms #Hardware Acceleration #Sparse Models

2026년 2월 8일

[논문리뷰] Scaling Embeddings Outperforms Scaling Experts in Language Models

이 논문은 대규모 언어 모델(LLMs)에서 Mixture-of-Experts (MoE) 아키텍처가 겪는 효율성 한계를 극복하기 위해 임베딩 스케일링 을 새로운 희소성 스케일링 차원으로 탐구하는 것을 목표로 합니다.

#Review #Embedding Scaling #N-gram Embedding #Mixture-of-Experts (MoE)#Large Language Models (LLMs)#Parameter Efficiency #Inference Optimization #Speculative Decoding

2026년 1월 29일

[논문리뷰] LongCat-Flash-Thinking-2601 Technical Report

본 논문은 장기적인 상호작용과 추론이 요구되는 에이전트 태스크 에서 기존 모델들의 한계를 극복하고, 뛰어난 에이전트 추론 능력을 가진 오픈소스 MoE(Mixture-of-Experts) 대규모 언어 모델인 LongCat-Flash-Thinking-2601 을 개발하는 것을 목표로 합니다.

#Review #Agentic AI #Large Language Models (LLMs)#Mixture-of-Experts (MoE)#Reinforcement Learning (RL)#Context Management #Scalable Training #Test-Time Reasoning #Open-Source Model

2026년 1월 25일

[논문리뷰] The Illusion of Specialization: Unveiling the Domain-Invariant 'Standing Committee' in Mixture-of-Experts Models

본 연구는 MoE(Mixture-of-Experts) 모델 이 희소 라우팅을 통해 도메인 특화(domain specialization)를 달성한다는 일반적인 가정에 의문을 제기합니다.

#Review #Mixture-of-Experts (MoE)#Sparse Routing #Domain Specialization #Load Balancing #Interpretability #Standing Committee #LLM

2026년 1월 8일

[논문리뷰] K-EXAONE Technical Report

LG AI Research는 K-EXAONE 이라는 대규모 다국어 언어 모델을 개발하여 최첨단 성능을 달성하는 것을 목표로 합니다. 특히, 기존 모델의 한계를 극복하고 한국의 AI 인프라 환경을 고려하여 효율적이면서도 강력한 범용 및 전문 AI 기반 모델을 제공하고자 합니다.

#Review #Multilingual Language Model #Mixture-of-Experts (MoE)#Long Context #AI Safety #Korean AI #Foundation Model #Reinforcement Learning (RL)

2026년 1월 5일

[논문리뷰] Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

본 논문은 Mixture-of-Experts (MoE) 모델에서 라우터의 결정이 개별 전문가의 실제 역량과 충분히 연동되지 않아 발생하는 성능 한계를 해결하고자 합니다. 라우터와 전문가 간의 약한 결합 문제를 개선하여 모델 성능을 향상시키는 동시에 효율성을 유지하는 가벼운 보조 손실 함수를 제안하는 것이 목표입니다.

#Review #Mixture-of-Experts (MoE)#Router-Expert Coupling #Auxiliary Loss #Expert Specialization #Large Language Models (LLMs)#Computational Efficiency

2025년 12월 29일

[논문리뷰] Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

본 논문은 LLM 기반 RL의 불안정성 문제를 해결하고, 시퀀스 레벨 보상을 토큰 레벨 최적화 목표로 효과적으로 근사하여 최적화할 수 있는 조건을 밝히는 것을 목표로 합니다. 특히, MoE 모델에서 동적 전문가 라우팅이 학습 안정성에 미치는 영향을 분석하고, 이를 완화하기 위한 실용적인 방법을 제시합니다.

#Review #Reinforcement Learning (RL)#Large Language Models (LLMs)#Policy Gradient #REINFORCE #Mixture-of-Experts (MoE)#Training Stability #Importance Sampling #Routing Replay #Off-policy Learning

2025년 12월 1일

[논문리뷰] Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

본 논문은 언어 중심의 접근 방식을 통해 멀티모달 이해, 추론 및 생성 능력을 통합하는 Uni-MoE-2.0-Omni 라는 효율적인 옴니모달 대규모 모델을 개발하는 것을 목표로 합니다.

#Review #Omnimodal Large Models #Mixture-of-Experts (MoE)#Language-Centric AI #Multimodal Understanding #Multimodal Generation #Progressive Training #Omni-Modality 3D RoPE

2025년 11월 17일

[논문리뷰] Virtual Width Networks

본 논문은 Transformer 모델의 히든 차원을 늘릴 때 발생하는 Quadratic한 계산 비용 문제를 해결하면서도, 더 넓은 표현(wider representations)이 제공하는 이점을 얻는 것을 목표로 합니다.

#Review #Virtual Width Networks #Transformer #Mixture-of-Experts (MoE)#Scaling Laws #Representation Learning #Model Efficiency #Multi-Token Prediction #Hyper-Connections

2025년 11월 16일

[논문리뷰] Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

MoE LLM의 라우터가 최적의 라우팅 대비 10-20%의 성능 격차 를 보이며, 태스크 임베딩 매니폴드와 라우팅 가중치 매니폴드 간의 misalignment로 인해 일반화 성능이 저하되는 문제를 해결하는 것을 목표로 합니다. 이를 통해 MoE LLM의 라우팅 효율성과 일반화 성능을 향상시키고자 합니다.

#Review #Mixture-of-Experts (MoE)#Large Language Models (LLMs)#Router Optimization #Manifold Regularization #Generalization #Post-training Fine-tuning #Task Embedding Alignment

2025년 11월 10일

[논문리뷰] LongCat-Flash-Omni Technical Report

LongCat-Flash-Omni는 560B 파라미터 규모의 최첨단 오픈소스 옴니모달 모델로, 견고한 오프라인 멀티모달 이해와 저지연 실시간 오디오-시각 상호작용 을 통합하는 것을 목표로 합니다.

#Review #Omni-modal AI #Multimodal LLM #Real-time Interaction #Mixture-of-Experts (MoE)#Streaming Inference #Distributed Training #Curriculum Learning #Audio-Visual Perception

2025년 11월 9일

[논문리뷰] EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence

본 연구는 의사 전문성에 크게 의존하고 주관적이며 비효율적인 기존 초음파 진단의 한계를 극복하고, 일반적인 VLM(Vision-Language Model) 의 초음파 의료 도메인 지식 부족 문제를 해결하고자 합니다.

#Review #Vision-Language Models #Ultrasound Imaging #Medical Diagnosis #Mixture-of-Experts (MoE)#Instruction Tuning #Multimodal AI #Report Generation #VQA

2025년 9월 19일

[논문리뷰] Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

본 논문은 MoE(Mixture-of-Experts) 언어 모델에서 스파시티(sparsity)가 기억(memorization) 능력과 추론(reasoning) 능력에 미치는 영향을 규명하고, 고정된 연산 예산(compute budget) 내에서 태스크별 최적의 스파시티 구성을 찾는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Sparsity #Scaling Laws #Reasoning Tasks #Memorization #Large Language Models #Generalization Gap #Top-k Routing

2025년 8월 27일

[논문리뷰] Intern-S1: A Scientific Multimodal Foundation Model

본 논문은 과학 분야에서 오픈 소스 파운데이션 모델과 클로즈드 소스 모델 간의 성능 격차를 줄이고자 합니다.

#Review #Multimodal Foundation Model #Scientific AI #Reinforcement Learning #Mixture-of-Experts (MoE)#Dynamic Tokenizer #Data Curation #Low-Resource Learning

2025년 8월 22일

[논문리뷰] MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

대규모 MoE 기반 LLM(예: DeepSeek-V3-0324 , Kimi-K2-Instruct )의 막대한 메모리 요구사항으로 인한 배포 병목 현상을 해결하고자 합니다.

#Review #Mixture-of-Experts (MoE)#LLM Compression #Matrix Decomposition #Parameter Efficiency #Deep Learning #Memory Optimization

2025년 8월 12일

[논문리뷰] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

본 논문은 로봇이 실제 환경에서 효과적으로 작동하기 위해 멀티모달 추론과 정확한 동작 생성을 통합하는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Instruction Tuning #Multimodal Reasoning #Robotic Manipulation #Catastrophic Forgetting #Mixture-of-Experts (MoE)#Flow Matching

2025년 8월 5일

[논문리뷰] Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

본 논문은 Mixture-of-Experts(MoE)를 Diffusion Transformers(DiTs)에 적용할 때 발생하는 제한적인 성능 향상 문제를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Diffusion Transformers (DiTs)#Routing Guidance #Semantic Specialization #Contrastive Learning #Image Generation #Flow Matching

2025년 10월 29일

[논문리뷰] Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models

MoE(Mixture-of-Experts) 모델이 배포 시 발생하는 분포 변화(distribution shifts) 로 인해 차선적인 라우팅 결정(suboptimal routing decisions) 을 겪는 문제를 해결하는 것이 목표입니다.

#Review #Mixture-of-Experts (MoE)#Online Adaptation #Test-Time Adaptation (TTA)#Expert Routing #Large Language Models (LLMs)#Self-Supervision #Computational Efficiency #Context Shift Robustness

2025년 10월 20일