#Multimodal Dataset

4개의 포스트

[논문리뷰] Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset

기존 산업용 결함 검사 시스템의 높은 오탐률, 낮은 적응성, 일반화 능력 부족, 그리고 블랙박스 모델의 해석 불가능성 한계를 극복하는 것이 목표입니다.

#Review #Industrial Defect Detection #Multimodal Dataset #Vision-Language Model #Diffusion Model #Open-Vocabulary Learning #Quality Inspection #Data Efficiency #Foundation Model

2026년 1월 8일

[논문리뷰] MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

기존 3D 도시 생성 방법론의 한계인 텍스트 기반 생성의 창의적 유연성과 객체 수준 편집 가능성 및 구조적 일관성 부족 문제를 해결하는 것을 목표로 합니다.

#Review #3D City Generation #Natural Language Processing #Aesthetic Adaptation #Controllable Assets #Layout Generation #Interactive Editing #Diffusion Models #Multimodal Dataset

2025년 11월 25일

[논문리뷰] EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation

기존 비디오 생성 시스템이 감성적 차원을 소홀히 다루고 특히 스타일화되거나 비현실적인 콘텐츠에서 감정 이해와 생성 간의 격차가 크다는 문제를 해결하고자 합니다.

#Review #Multimodal Dataset #Emotion Recognition #Video Generation #Affective Computing #Stylized Media #Diffusion Models #Video Understanding #Text-to-Video

2025년 11월 16일

[논문리뷰] PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

본 논문은 인간 행동 특성 분석을 위한 멀티모달 데이터셋의 부족 문제 를 해결하고, LLM(Large Language Model)을 통해 추론된 행동 특성을 시각 및 전기적 속성과 결합하여 체계적인 교차 모달 및 인과 관계 연구를 가능하게 하는 것을 목표로 합니다.

#Review #Multimodal Dataset #LLM Inference #Behavioral Traits #Causal Representation Learning #Big Five #Multimodal AI #Causal Discovery #Human-Computer Interaction

2025년 9월 16일