#Region-Text Matching

1개의 포스트

[논문리뷰] FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

기존 비전-언어 모델(VLM)이 대규모 전역 정렬에는 능숙하지만, 객체 속성, 공간 관계, 미묘한 언어 표현 등 세분화된 디테일 을 포착하고 비영어권 환경(특히 중국어) 에서 다국어 지원이 부족하다는 문제점을 해결하는 것을 목표로 합니다.

#Review #Vision-Language Alignment #Fine-grained Understanding #Bilingual Model #Contrastive Learning #Multimodal Retrieval #Open-Vocabulary Detection #Region-Text Matching

2025년 10월 16일