#Discrete Diffusion Models

9개의 포스트

[논문리뷰] Hierarchical Codec Diffusion for Video-to-Speech Generation

본 논문은 기존 VTS 방법론들이 음성의 계층적 구조를 간과함으로써 시각 정보와 음성 특징 간의 효과적인 정렬에 한계를 보이는 문제를 해결하고자 한다.

#Review #Video-to-Speech #Discrete Diffusion Models #Hierarchical Modeling #Audio-Visual Alignment #Residual Vector Quantization #Transformer

2026년 4월 19일

[논문리뷰] Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Discrete diffusion models는 고품질 데이터를 생성할 수 있지만, 일반적으로 샘플링에 많은 반복(sampling steps) 이 필요하며 이는 높은 계산 비용 과 FLOPs 로 이어진다는 문제점이 있습니다.

#Review #Discrete Diffusion Models #Distillation #Moment Matching Distillation #D-MMD #GPT-2 Gradient Moment #Few-step Generators #CIFAR-10 #Open Web Text

2026년 3월 22일

[논문리뷰] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

본 논문은 기존 멀티모달 대규모 언어 모델(MLLM)이 주로 사용하는 자기회귀(autoregressive) 아키텍처 의 한계를 극복하고, 텍스트, 음성, 이미지 전반에 걸친 이해 및 생성을 통합할 수 있는 새로운 확률적 모델링 대안 을 탐색하는 것을 목표로 합니다.

#Review #Multimodal AI #Discrete Diffusion Models #Masked Language Modeling #Unified Generative Models #Any-to-Any #Speech-to-Image #Visual Question Answering

2026년 3월 10일

[논문리뷰] Balancing Understanding and Generation in Discrete Diffusion Models

이 논문은 이산 확산 모델(Discrete Diffusion Models, DDM) 분야에서 Masked Diffusion Language Models (MDLM) 의 의미 이해 능력과 Uniform-noise Diffusion Language Models (UDLM) 의 고품질 소수 단계 생성 능력 간의 불균형을 해결하는 것을 목표로 합니다.

#Review #Discrete Diffusion Models #Language Modeling #Image Generation #Masked Diffusion #Uniform Noise #XDLM #Stationary Noise Kernel #Pareto Frontier

2026년 2월 3일

[논문리뷰] Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

본 논문은 Masked Diffusion Models (MDMs)의 주요 비효율성, 즉 KV 캐싱 미지원 과 불필요한 마스크 토큰 처리 로 인한 느린 추론 속도 문제를 해결하고자 합니다. 특히, 멀티모달 태스크 전반에서 성능 저하 없이 효율성을 크게 향상시키는 새로운 모델링 프레임워크 를 제안하는 것이 목표입니다.

#Review #Discrete Diffusion Models #Multimodal Models #Sparse Parameterization #KV Caching #Token Truncation #Image Generation #Image Editing #Visual Reasoning

2025년 12월 16일

[논문리뷰] Scaling Behavior of Discrete Diffusion Language Models

본 논문은 Discrete Diffusion Language Models (DLMs) 의 스케일링 행동을 체계적으로 연구하여, 기존 Autoregressive Language Models (ALMs) 와의 경쟁력을 평가하고 DLMs의 핵심 한계점(예: 병렬 생성 및 수정 능력 부족)을 해결하는 것을 목표로 합니다.

#Review #Discrete Diffusion Models #Scaling Laws #Language Models #Masked Diffusion #Uniform Diffusion #Hyperparameter Tuning #Compute-Optimal Training

2025년 12월 14일

[논문리뷰] From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

이 논문은 비전-언어 확산 모델에서 발생하는 train-inference 불일치 로 인한 오류 연쇄(error cascade) 문제를 해결하는 것을 목표로 합니다. 특히 병렬 디코딩 시 초기 토큰 오류가 전체 생성 컨텍스트를 오염시켜 구문 오류 및 의미론적 환각 을 유발하는 문제를 극복하고자 합니다.

#Review #Discrete Diffusion Models #Vision-Language Models #Error Cascades #Self-Correction #Refinement Framework #Parallel Generation #Image Captioning #Hallucination Mitigation

2025년 10월 27일

[논문리뷰] Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

본 논문은 이산 확산 모델(Discrete Diffusion Models)의 주요 한계점인 '샘플링 벽(sampling wall) 문제' 를 해결하는 것을 목표로 합니다.

#Review #Discrete Diffusion Models #Sampling Wall #Loopholing #Self-Conditioning #Non-Autoregressive Generation #Text Generation #Language Modeling #Reasoning Tasks

2025년 10월 24일

[논문리뷰] Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation

본 논문은 기존 의료 AI 모델의 모달리티별 단편화 문제를 해결하고, 의료 이미지(방사선, 병리학)와 임상 보고서 간의 통합적인 생성 능력 을 갖춘 범용 의료 AI 에이전트를 개발하는 것을 목표로 합니다.

#Review #Discrete Diffusion Models #Multimodal Large Language Models (MLLMs)#Medical Image Generation #Medical Report Generation #Multimodal Generation #Medical AI #Cross-modal Alignment

2025년 10월 8일