#Modality Gap Mitigation

2개의 포스트

[논문리뷰] V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation

논문은 기존 Text-to-Music(T2M) 모델의 한계인 비디오 이벤트와의 정밀한 시간 정렬 제어 부족 문제를 해결하고, V2M-ZERO 라는 Zero-Pair 비디오-투-음악 생성 접근 방식을 제안합니다.

#Review #Video-to-Music Generation #Temporal Alignment #Zero-Pair Learning #Rectified Flow Model #Diffusion Transformer #Event Curves #Modality Gap Mitigation

2026년 3월 11일

[논문리뷰] One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

본 논문은 기존의 이미지 전체 기반(image-centric) 제로샷 캡셔닝 모델이 지역 단위 캡셔닝에서 낮은 성능을 보이는 문제를 해결하고자 합니다.

#Review #Zero-Shot Captioning #Region-Level Captioning #Vision Transformers #DINOv2 #Patch-Centric #Modality Gap Mitigation #Visual-Language Models

2025년 10월 13일