#Multi-Token Prediction (MTP)

2개의 포스트

[논문리뷰] The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

본 논문은 large language model (LLM)이 long-horizon agentic workflow로 전환됨에 따라 발생하는 efficiency 및 cost bottleneck 문제와 intrinsically complex, high-stakes task 해결의 어려움을 다룹니다.

#Review #Mixture-of-Experts (MoE)#Mini Activations #Agentic AI #Self-Evolution #Reinforcement Learning (RL)#Multi-Token Prediction (MTP)

2026년 5월 26일

[논문리뷰] Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

본 논문은 11B 활성화 파라미터 를 가진 196B Mixture-of-Experts (MoE) 모델 인 Step 3.5 Flash 를 소개하며, 첨단 에이전트 지능과 컴퓨팅 효율성 간의 격차를 해소하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Sparse Models #Inference Efficiency #Hybrid Attention #Multi-Token Prediction (MTP)#Reinforcement Learning (RL)#Agentic AI #Long-Context Understanding

2026년 2월 11일