#dLLM

2개의 포스트

[논문리뷰] dMoE: dLLMs with Learnable Block Experts

본 논문은 MoE 기반 dLLM에서 블록 병렬 디코딩(block parallel decoding) 시 발생하는 과도한 전문가 활성화 문제를 해결하여 inference 효율성을 높이는 것을 목적으로 합니다.

#Review #dLLM #Mixture-of-Experts #Parallel Decoding #Block-level Routing #Expert Compression #Memory-bound

2026년 5월 31일

[논문리뷰] LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Diffusion Large Language Models (dLLM)은 병렬 추론 잠재력이 높음에도 불구하고, 현재 confidence-driven 디코딩 전략은 1-3 TPF (Tokens Per Forward pass) 에 머물러 실제 병렬성을 충분히 활용하지 못합니다.

#Review #dLLM #Parallel Decoding #Lookahead #Inference Acceleration #Token Filling Order #Branch Parallelism #Diffusion Models

2025년 12월 22일