#Document Understanding

8개의 포스트

[논문리뷰] DocAtlas: Multilingual Document Understanding Across 80+ Languages

본 논문은 기존 Document Understanding 모델들이 다국어 데이터 처리 및 문서 구조 파악에서 겪는 한계를 극복하기 위해 DocAtlas를 제안한다. 대다수의 기존 모델들은 특정 언어군에 편향되어 있거나, 복잡한 문서 레이아웃을 처리하는 데 있어 성능이 저하되는 Generalization 문제를 겪고 있다.

#Review #Document Understanding #Multilingual #Vision-Language Models #OCR #Multimodal Learning

2026년 5월 19일

[논문리뷰] OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

본 논문은 기존의 파편화된 OCR 접근법의 한계를 극복하고, 텍스트 중심(Text-centric) 및 비전 중심(Vision-centric) OCR 기능을 통합하는 최초의 엔드-투-엔드(end-to-end) 통합 OCR 방법론 인 OCRVerse를 제안합니다.

#Review #Holistic OCR #Vision-Language Models #Multi-domain Training #Text-centric OCR #Vision-centric OCR #SFT-RL #Code Generation #Document Understanding

2026년 1월 29일

[논문리뷰] DeepSeek-OCR 2: Visual Causal Flow

본 논문은 기존 Vision-Language Model (VLM) 이 시각 토큰을 고정된 래스터 스캔 순서로 처리하여 인간의 유연한 시각 인지 방식과 상충하는 문제를 해결하고자 합니다.

#Review #OCR #Vision-Language Model #Causal Reasoning #Transformer Architecture #Attention Mechanism #Document Understanding #DeepEncoder

2026년 1월 28일

[논문리뷰] Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

기존 VLM이 태국어와 같은 저자원 언어의 복잡한 스크립트 특성(비라틴 문자, 명시적 단어 경계 부재, 스택형 발음 구별 부호) 및 비정형 문서 레이아웃으로 인해 겪는 한계를 해결하는 것입니다.

#Review #Vision-Language Model #OCR #Thai Language Processing #Document Understanding #Low-Resource Language #Data Synthesis #Fine-tuning #Layout Analysis

2026년 1월 21일

[논문리뷰] LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

논문은 복잡한 다단계 OCR 파이프라인 없이 문서 이미지를 깨끗하고 자연스럽게 정렬된 텍스트로 변환하는 10억 개의 파라미터를 가진 종단 간 다국어 비전-언어 모델 LightOnOCR-2-1B 를 제안합니다.

#Review #OCR #Vision-Language Model #End-to-End Learning #Multilingual #Reinforcement Learning #Document Understanding #Bounding Box Prediction #Task Arithmetic Merging

2026년 1월 20일

[논문리뷰] Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR

본 논문은 필기체 스크립트, 다양한 글꼴, 발음 기호, 우-좌향 텍스트 방향성으로 인해 어려운 아랍어 문서 OCR의 과제를 해결하고자 합니다.

#Review #Arabic OCR #Vision-Language Model #Fine-tuning #Document Understanding #Markdown Conversion #Benchmark

2025년 9월 24일

[논문리뷰] Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?

이 논문은 현재 문서 검색 증강 생성(RAG) 시스템 의 평가 벤치마크가 실제 세계의 복잡성과 한계를 제대로 반영하지 못하는 문제점을 해결하고자 합니다.

#Review #Retrieval-Augmented Generation #Multimodal LLMs #Benchmark Evaluation #Document Understanding #Multi-hop Reasoning #Information Retrieval #Evaluation Dataset

2025년 8월 8일

[논문리뷰] Document Understanding, Measurement, and Manipulation Using Category Theory

본 논문은 범주 이론(Category Theory) 을 활용하여 문서의 구조를 추출하고 정보 콘텐츠를 측정 하며, 요약 및 확장(exegesis) 과 같은 조작을 가능하게 하는 수학적 프레임워크를 개발하는 것을 목표로 합니다.

#Review #Category Theory #Document Understanding #Large Language Models #Information Theory #Rhetorical Structure Theory #Document Summarization #Rate Distortion Analysis #Self-supervised Learning

2025년 10월 27일