#Visual Embeddings

1개의 포스트

[논문리뷰] Reconstruction Alignment Improves Unified Multimodal Models

논문은 통합 멀티모달 모델(UMM)이 이미지-텍스트 쌍으로 훈련될 때 캡션의 희소성으로 인해 미세한 시각적 디테일을 놓치고, 이해와 생성 간의 정렬이 불완전하다는 문제를 해결하고자 합니다.

#Review #Unified Multimodal Models #Image Generation #Image Editing #Post-training #Self-supervised Learning #Reconstruction Alignment #Visual Embeddings

2025년 9월 10일