#Unified Multimodal Model

5개의 포스트

[논문리뷰] Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

기존의 Text-to-Image(T2I) 모델들은 고품질 이미지 생성 능력은 탁월하지만, 학습 데이터에 포함되지 않은 long-tail 개념이나 특정 인물, 문화적 상징 등 외부 세계 지식이 요구되는 상황에서 identity drift나 환각(hallucination) 문제를 겪습니다.

#Review #Multimodal Agent #World-Grounded Image Synthesis #FactIP #Agentic Pipeline #Unified Multimodal Model #Evidence-Grounded Recaptioning

2026년 3월 31일

[논문리뷰] UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations

본 연구는 기존 통합 멀티모달 모델의 한계를 해결하고자 합니다. 특히, 이산적인 시각 토크나이저 사용으로 인한 세부 의미 정보 손실 문제와, 연속적인 고차원 시각 표현을 직접 모델링할 때 발생하는 학습 불안정성 및 느린 수렴 문제를 극복하는 것을 목표로 합니다.

#Review #Unified Multimodal Model #Image Generation #Image Understanding #Semantic Compression #Continuous Representation #Diffusion Model #Transformer #Image Editing

2026년 3월 11일

[논문리뷰] EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

이 논문은 이미지 및 비디오 생성과 편집 작업이 아키텍처적 한계와 데이터 부족으로 인해 파편화되어 있다는 문제를 해결하고자 합니다. 단일 모델 내에서 이미지 및 비디오 편집과 생성을 통합하는 EditVerse 프레임워크를 제안하여, 인컨텍스트 학습 을 통해 다양한 모달리티를 유연하게 처리하는 것을 목표로 합니다.

#Review #Unified Multimodal Model #In-Context Learning #Image and Video Editing #Video Generation #Full Self-Attention #Rotary Positional Embedding #Cross-Modal Knowledge Transfer

2025년 9월 25일

[논문리뷰] Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

카메라 중심의 장면 이해와 생성을 별개의 문제로 다루던 기존 방식의 한계를 극복하고, 이를 단일 멀티모달 모델 로 통합하는 것을 목표로 합니다.

#Review #Unified Multimodal Model #Camera-Centric #Image Understanding #Image Generation #Spatial Reasoning #Camera Parameters #Instruction Tuning #Multimodal Spatial Intelligence

2025년 10월 13일

[논문리뷰] UniVideo: Unified Understanding, Generation, and Editing for Videos

기존의 통합 멀티모달 모델들이 이미지 도메인에 주로 한정되어 있고, 비디오 관련 작업은 태스크별 전문 모델에 의존하는 한계를 극복하고자 합니다. 본 연구는 비디오에 대한 통합적인 이해, 생성, 편집 을 단일 프레임워크 내에서 수행할 수 있는 다재다능한 모델을 개발하는 것을 목표로 합니다.

#Review #Unified Multimodal Model #Video Generation #Video Editing #MLLM #Diffusion Transformer #In-Context Learning #Zero-shot Generalization #Multimodal AI

2025년 10월 10일