#Video-Based Spatial Intelligence

1개의 포스트

[논문리뷰] MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

본 논문은 MLLM(Multi-modal Large Language Models)이 물리적 환경에서 일반적인 비서 역할을 수행하기 위해 필수적인 비디오 기반 공간 지능 을 평가할 수 있는 포괄적인 벤치마크의 부재를 해결하고자 합니다.

#Review #Video-Based Spatial Intelligence #MLLM Benchmark #Spatial Reasoning #Multi-Modal Learning #Perception #Planning #Prediction #Cross-Video Reasoning #Human-AI Gap

2025년 12월 17일