#Dynamic Routing

5개의 포스트

[논문리뷰] BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

본 논문은 표준 MoE 모델의 고정된 Top-K 라우팅 방식이 초래하는 연산 중복 문제를 해결하기 위해 BEAM을 제안한다. 기존의 Top-K 메커니즘은 토큰별 복잡도를 고려하지 않고 모든 토큰에 동일한 수의 Expert를 할당하여 불필요한 연산을 발생시킨다.

#Review #Mixture-of-Experts #Dynamic Routing #Expert Sparsity #Inference Acceleration #Binary Expert Activation Masking #vLLM

2026년 5월 14일

[논문리뷰] Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

본 논문은 기존 long-context LLM 추론에서 발생하는 quadratic computational complexity와 기존 하이브리드 어텐션 기법들의 한계를 해결하고자 합니다.

#Review #Large Language Models #Long-context Inference #Hybrid Attention #Dynamic Routing #Layer-level Sparsity #Context-aware

2026년 4월 9일

[논문리뷰] GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

대규모 추론 모델(LRMs)의 다단계 사고 체인 생성에서 발생하는 막대한 추론 지연 및 계산 비용 문제를 해결하는 것이 목표입니다. 기존 협업 추론 방식들이 도입하는 비효율적인 오버헤드를 줄이면서, 각 추론 단계의 난이도를 효과적으로 예측하여 적절한 모델(경량 또는 대규모)을 할당하는 방법을 모색합니다.

#Review #Collaborative Inference #Large Reasoning Models (LRMs)#Inference Latency #Step-wise Routing #Initial Token Entropy #Dynamic Routing #Computational Efficiency

2026년 1월 12일

[논문리뷰] UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

본 연구는 음성 및 음악 생성의 통합이라는 오랜 과제를 해결하는 것을 목표로 합니다.

#Review #Mixture of Experts #Speech Generation #Music Generation #Multimodal AI #Dynamic Routing #Training Curriculum #Data Imbalance #Audio Synthesis

2025년 10월 16일

[논문리뷰] Dr.LLM: Dynamic Layer Routing in LLMs

대규모 언어 모델(LLM)이 모든 입력 토큰을 고정된 모든 레이어에 통과시키면서 발생하는 비효율성(쉬운 작업 시 연산 낭비)과 복잡한 추론 작업 시 유연성 부족 문제를 해결하는 것을 목표로 합니다.

#Review #Dynamic Routing #LLMs #Adaptive Depth #Computational Efficiency #Monte Carlo Tree Search (MCTS)#Retrofittable Framework #Supervised Learning #Accuracy Improvement

2025년 10월 15일