#Text-to-Image Conversion

1개의 포스트

[논문리뷰] Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

본 논문은 Multimodal Large Language Models (MLLMs) 가 텍스트를 이미지 형태로 처리할 때 발생하는 '모달리티 갭(modality gap)'을 체계적으로 진단하고 해결하는 것을 목표로 합니다.

#Review #Multimodal LLMs #Modality Gap #Visual Text Understanding #Error Analysis #Self-Distillation #Text-to-Image Conversion #Reasoning Collapse

2026년 3월 10일