#Dataset Curation

8개의 포스트

[논문리뷰] Relational Visual Similarity

본 연구는 기존 이미지 유사성 모델들이 시각적 속성(perceptual attribute)에만 집중하여, 인간이 인지하는 추상적이고 관계적인 시각 유사성(relational visual similarity)을 포착하지 못하는 한계를 해결하고자 합니다.

#Review #Relational Similarity #Visual Similarity #Vision-Language Models #Anonymous Captioning #Image Retrieval #Analogical Reasoning #Dataset Curation

2025년 12월 8일

[논문리뷰] Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

논문은 단일 정적 이미지로부터 물리적으로 그럴듯하고 시간적으로 일관된 동적인 4D 장면(3D 기하학과 시간적 역학) 을 생성하는 핵심적인 문제를 해결하는 것을 목표로 합니다. 기존의 기하학-모션 분리 패러다임에서 발생하는 시공간적 불일치와 일반화 부족 문제를 극복하고자 합니다.

#Review #4D Synthesis #3D Reconstruction #Motion Generation #Single Image #Diffusion Model #Point Cloud #Dataset Curation #View Synthesis

2025년 12월 7일

[논문리뷰] Loomis Painter: Reconstructing the Painting Process

본 논문은 기존 생성 모델들이 겪는 시간적 불연속성, 구조적 불일치, 그리고 다양한 예술 매체에 대한 일반화 능력 부족 문제를 해결하여, 어떤 입력 이미지에 대해서도 사실적이고 일관된 단계별 그림 그리기 과정 을 생성하는 것을 목표로 합니다.

#Review #Painting Process Generation #Video Diffusion Models #Media Transfer #Reverse Painting #Dataset Curation #Perceptual Distance Profile #Artistic Workflow #Generative AI

2025년 11월 23일

[논문리뷰] CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

본 연구는 역사 문서의 텍스트 인식 정확도를 높이고 비용을 절감하기 위해 오픈-웨이트 대규모 비전-언어 모델(VLM) 인 CHURRO 를 개발하는 것을 목표로 합니다.

#Review #Historical Text Recognition #Vision-Language Model #Open-Weight Model #OCR #Cultural Heritage #Low-Cost AI #Dataset Curation #Fine-tuning

2025년 9월 29일

[논문리뷰] OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

본 연구는 다양한 문서 레이아웃 데이터의 부족과 복잡한, 긴 시퀀스 시나리오에서 기존 문서 레이아웃 생성 방법론의 한계를 극복하는 것을 목표로 합니다.

#Review #Document Layout Generation #Large Language Models (LLMs)#Coarse-to-Fine Learning #Dataset Curation #OmniLayout-1M #Document AI #Generative Models

2025년 10월 31일

[논문리뷰] LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL

본 논문은 기존 WikiSQL 데이터셋이 가진 데이터 타입 불일치, 대소문자 일관성 부족, 구문 오류, 답변 불가 질문 등의 구조적, 주석 관련 문제점을 해결하고자 합니다.

#Review #Text-to-SQL #WikiSQL #LLM #Dataset Curation #Natural Language Processing #Benchmark #SQL Generation #Data Cleaning

2025년 10월 7일

[논문리뷰] Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

기존 비디오 추론 모델들이 텍스트 기반 추론만을 제공하며 핵심 증거의 시점과 위치를 명시하지 못하는 문제를 해결하고자 합니다.

#Review #Video Reasoning #Spatio-Temporal Grounding #Large Multimodal Models #Reinforcement Learning #Chain-of-Thought #Visual Evidence #Dataset Curation

2025년 10월 24일

[논문리뷰] Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

본 논문은 대규모, 고품질, 공개적으로 접근 가능한 텍스트 기반 이미지 편집 데이터셋의 부족으로 인해 제한되었던 연구 발전을 해소하는 것을 목표로 합니다. 실제 이미지를 기반으로 한 포괄적이고 다양한 데이터셋을 제공하여 차세대 텍스트 기반 이미지 편집 모델의 훈련 및 벤치마킹을 위한 견고한 기반을 구축하고자 합니다.

#Review #Text-Guided Image Editing #Large-Scale Dataset #Multimodal Models #Dataset Curation #Quality Control #Prompt Engineering #Preference Learning #Multi-Turn Editing

2025년 10월 23일