#Video Dataset

2개의 포스트

[논문리뷰] Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

논문은 기존 비디오 이해 데이터셋이 자연스러운 장기적 일상생활을 반영하지 못하고 짧은 클립 위주라는 한계를 지적하며, 진정한 다중 모드 평생 이해(Multimodal Lifelong Understanding) 태스크를 엄격하게 정의하는 것을 목표로 합니다.

#Review #Multimodal Lifelong Understanding #Video Dataset #Agentic AI #Dynamic Memory Management #Long-Context MLLMs #Temporal Reasoning #Concept Drift

2026년 3월 5일

[논문리뷰] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

본 논문은 대규모의 실세계 동적 비디오 데이터셋에 부족한 명시적인 공간 정보 및 풍부한 의미론적 주석의 부재 문제를 해결하고자 합니다. 이는 3D 재구성, 세계 모델링, 그리고 동적 장면 합성과 같은 AI/ML 분야의 발전을 저해하며, 물리적으로 일관성 있는 모델 학습을 위한 핵심 자원의 필요성을 강조합니다.

#Review #Video Dataset #Spatial Annotation #Camera Pose Estimation #Depth Map #Structured Caption #Motion Instruction #3D Vision #World Modeling

2025년 9월 12일