#Visual Document Understanding

2개의 포스트

[논문리뷰] Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

본 논문은 멀티 벡터 시각적 문서 검색(VDR) 시스템에서 발생하는 심각한 저장 효율성 병목 현상 을 해결하고 동시에 검색 성능을 향상시키는 것을 목표로 합니다. 기존 멀티 벡터 모델의 패치 기반 임베딩 방식이 초래하는 막대한 저장 비용과 문서 레이아웃 구조에 대한 명시적인 접지 부족 문제를 극복하고자 합니다.

#Review #Multi-Vector Retrieval #Visual Document Understanding #Document Parsing #Layout-Informed Embeddings #Information Bottleneck #Storage Efficiency #Late Interaction

2026년 3월 8일

[논문리뷰] Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

본 연구는 기존 비전-언어 모델(VLMs)이 매개변수 규모에 제약이 있고, 견고한 자가 수정 능력이 부족하며, 긴 시각적 맥락과 복잡한 추론을 요구하는 문서 기반 태스크에서 저조한 성능을 보이는 문제를 해결하고자 합니다.

#Review #Visual Document Understanding #Visual Question Answering #Multi-Agent System #Test-Time Scaling #Self-Correction #Mixed Reward Modeling #Large Language Models

2025년 8월 8일