#Video Models

6개의 포스트

[논문리뷰] Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

본 논문은 대규모 사전 훈련된 비디오 생성 모델 의 시공간적 사전 지식을 로봇 정책 학습에 활용하는 것을 목표로 합니다.

#Review #Video Models #Visuomotor Control #Robot Policy #Fine-tuning #Diffusion Models #World Models #Model-based Planning #Imitation Learning

2026년 1월 22일

[논문리뷰] CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

본 논문은 비디오 모델을 텍스트-투-이미지(T2I) 생성의 '순수한 시각적 추론기'로 활용하여, 기존 T2I 모델의 시각적 추론 시작점 부재와 중간 단계의 불명확성 문제를 해결하는 것을 목표로 합니다.

#Review #Text-to-Image Generation #Video Models #Visual Reasoning #Chain-of-Frame (CoF)#Progressive Refinement #Diffusion Models #CoF-Evol-Instruct

2026년 1월 15일

[논문리뷰] World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

본 논문은 최첨단 제어 가능한 비디오 모델이 흔히 겪는 환각 현상과 불확실성 표현 능력 부족 문제를 해결하고자 합니다.

#Review #Controllable Video Generation #Uncertainty Quantification #Video Models #Calibration #Out-of-Distribution Detection #Proper Scoring Rules #Latent Space

2025년 12월 7일

[논문리뷰] iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

iMontage는 사전 훈련된 비디오 모델을 재활용하여 고도로 동적인 다대다 이미지 생성을 위한 통합 프레임워크를 제시합니다.

#Review #Image Generation #Video Models #Diffusion Models #Many-to-many #Unified Framework #Temporal Consistency #Image Editing #Positional Embedding

2025년 11월 25일

[논문리뷰] Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

본 논문은 비디오 모델의 추론 능력, 특히 비디오 생성 을 통한 추론 능력을 체계적으로 평가하기 위한 포괄적인 벤치마크의 부재를 해결합니다.

#Review #Video Models #Spatial Reasoning #Maze Solving #Video Generation #Benchmark #Supervised Fine-tuning #Test-Time Scaling #Multimodal Reasoning

2025년 11월 19일

[논문리뷰] Video models are zero-shot learners and reasoners

본 논문은 비디오 모델이 대규모 언어 모델(LLM)이 언어 이해 분야에서 이룬 것과 같이, 일반적인 목적의 비전 파운데이션 모델이 될 수 있다는 가설을 제시합니다.

#Review #Video Models #Zero-shot Learning #Visual Reasoning #Foundation Models #Generative AI #Perception #Manipulation #Modeling

2025년 9월 25일