#Egocentric Video

10개의 포스트

[논문리뷰] The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

본 논문은 기존의 egocentric 4D 손 모션 재구성 방법론이 직면한 심각한 병목 현상을 해결하고자 합니다. 기존 방식들은 이미지 기반 탐지기(Detector)에 의존하거나, 제한된 데이터로 학습된 시간적 모듈을 사용하여 심한 은닉 상황에서 성능이 저하되는 한계가 있습니다 .

#Review #Video Diffusion Models #Hand Motion Reconstruction #Egocentric Video #4D Reconstruction #Embodied AI #Occlusion Reasoning

2026년 6월 29일

[논문리뷰] HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Embodied foundation model 학습의 핵심 병목 현상은 정밀하게 주석 처리된 고품질 로봇 데이터의 부족과 데이터 수집의 높은 비용입니다.

#Review #Embodied AI #Egocentric Video #Pretraining #Robot Learning #Scaling Laws #Generalization #World-Action Models

2026년 6월 18일

[논문리뷰] EgoCS-400K: An Egocentric Gameplay Dataset for World Models

본 논문은 대규모 상호작용 세계 모델(Interactive World Models) 학습을 위한 고품질의 영상-액션-언어 데이터셋 부족 문제를 해결하고자 한다.

#Review #World Models #Egocentric Video #Gaming Agent #Video Generation #Replay-grounded #Embodied AI

2026년 6월 16일

[논문리뷰] EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

본 논문은 일상적인 상호작용이 담긴 단일 egocentric RGB 영상으로부터 복잡한 변형체(Deformable objects)의 물리적 속성을 파악하여 '디지털 트윈'을 구축하는 난제를 해결하고자 합니다.

#Review #Physical Understanding #Real-to-sim #Egocentric Video #Deformable Objects #Digital Twin #Physics-based Simulation

2026년 6월 15일

[논문리뷰] Rethinking RAG in Long Videos: What to Retrieve and How to Use It?

본 논문은 VideoRAG 시스템이 직면한 평가의 불투명성과 최적의 검색 전략 부재 문제를 해결하고자 합니다.

#Review #VideoRAG #Egocentric Video #V-RAGBench #CARVE #Chunk-Adaptive Reranking #Multimodal Retrieval #Long-form Video Understanding

2026년 6월 14일

[논문리뷰] OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

본 논문은 실시간 환경에서 활동하는 멀티모달 에이전트가 단편적인 현재 시점의 정보가 아닌, 시간 흐름에 따른 공간적 구조를 지속적으로 유지하고 추론해야 한다는 도전 과제를 해결하고자 합니다.

#Review #Multimodal LLMs #Streaming Spatial Intelligence #Egocentric Video #Hierarchical Benchmark #Spatiotemporal Reasoning #Allocentric Mapping

2026년 6월 3일

[논문리뷰] PhysBrain 1.0 Technical Report

본 논문은 기존 VLA 시스템이 의존하는 플랫폼 종속적인 로봇 궤적(Trajectory) 데이터 수집의 한계를 극복하고, 물리적 환경에 대한 근본적인 이해(Physical Commonsense)를 확보하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied Intelligence #Physical Commonsense #Egocentric Video #Data Engine #VLA Adaptation

2026년 5월 17일

[논문리뷰] Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning

기존의 Multimodal Large Language Models (MLLMs)는 2D 시각 신호에 과도하게 고정되어 3D 환경에 대한 구조화된 추상화를 구축하지 못함으로써 3D 공간 추론(spatial reasoning)에서 어려움을 겪습니다.

#Review #Multimodal Large Language Models (MLLMs)#Spatial Reasoning #Textual Representation #Allocentric Context #Egocentric Video #Prompting Methods #VSI-Bench #OST-Bench

2026년 3월 25일

[논문리뷰] ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Latent World Models, 특히 V-JEPA2와 같은 JEPA-style 모델은 비디오 Observation으로부터 미래 World States를 예측하는 데 유망한 능력을 보여왔다.

#Review #Latent World Models #Vision-Language Models #Predictive Representation Learning #Dual-Temporal Sampling #Semantic Guidance #Trajectory Prediction #Egocentric Video #JEPA

2026년 3월 24일

[논문리뷰] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Multimodal AI agents는 online web execution을 포함하는 복잡한 real-world workflow를 점차 자동화하고 있습니다.

#Review #Multimodal AI Agents #Web-agent Benchmark #Egocentric Video #Visual Grounding #Online Evaluation #LLM-as-a-Judge #Perception-Action Alignment

2026년 3월 24일