#Sparse Models

3개의 포스트

[논문리뷰] Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

본 논문은 11B 활성화 파라미터 를 가진 196B Mixture-of-Experts (MoE) 모델 인 Step 3.5 Flash 를 소개하며, 첨단 에이전트 지능과 컴퓨팅 효율성 간의 격차를 해소하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Sparse Models #Inference Efficiency #Hybrid Attention #Multi-Token Prediction (MTP)#Reinforcement Learning (RL)#Agentic AI #Long-Context Understanding

2026년 2월 11일

[논문리뷰] OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

본 논문은 MoE 아키텍처에서 전문가 전문화의 세분성과 하드웨어 실행 효율성 사이의 본질적인 trade-off를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Fine-Grained Experts #Efficient Architectures #Transformer #Routing Algorithms #Hardware Acceleration #Sparse Models

2026년 2월 8일

[논문리뷰] UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

본 논문은 Mixture of Experts (MoE) 모델이 겪는 높은 메모리 접근 비용 문제를 해결하고, 기존 메모리 레이어 아키텍처인 UltraMem이 8-expert MoE 모델 성능에 미치지 못하는 격차를 해소하는 것을 목표로 합니다.

#Review #Memory Networks #Mixture of Experts (MoE)#Long-Context Learning #Sparse Models #Transformer Architecture #LLMs #Efficient Inference

2025년 8월 27일