#Multi-modal Large Language Models (MLLMs)

2개의 포스트

[논문리뷰] Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

최근 Multi-modal Large Language Models (MLLMs)의 발전은 일반 목적의 비디오 이해 분야에서 상당한 진전을 가져왔습니다. 그러나 이러한 모델들은 long-form, high-resolution 비디오를 처리하는 데 심각한 어려움을 겪고 있습니다.

#Review #Video Understanding #Multi-modal Large Language Models (MLLMs)#Vision Transformers (ViTs)#Autoregressive Gazing #Token Reduction #Multi-scale Patches #High-Resolution Video #Long-Form Video

2026년 3월 24일

[논문리뷰] ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

논문은 기존 이미지 생성 및 통합 모델이 깊은 추론, 계획, 그리고 데이터-시각 매핑의 정밀성을 요구하는 복잡한 태스크에서 한계를 보이는 문제에 주목합니다.

#Review #Table Visualization #Infographic Generation #Multi-modal Large Language Models (MLLMs)#Diffusion Models #Self-Correction #Reinforcement Learning #Graphic Design #Data-to-Visual Mapping

2025년 12월 16일