#ElasticMoT

1개의 포스트

[논문리뷰] Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

본 논문은 기존 멀티모달 Masked Diffusion Model (MDM)의 한계를 극복하고, 이미지 이해, 객체 접지, 이미지 편집, 고해상도(1024px) 텍스트-투-이미지 생성 등 광범위한 멀티모달 태스크를 단일 프레임워크 내에서 처리할 수 있는 통합 MDM 인 Lavida-O를 제안하는 것을 목표로 합니다.

#Review #Multimodal AI #Masked Diffusion Models #Image Understanding #Image Generation #Image Editing #Object Grounding #ElasticMoT #Self-reflection

2025년 9월 25일