#Spatial Intelligence

15개의 포스트

[논문리뷰] SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

본 논문은 기존의 정적인 VQA나 시뮬레이터 종속적 벤치마크가 멀티모달 에이전트의 실제 환경에서의 동적 공간 추론 능력을 평가하는 데 한계가 있다는 점을 지적합니다. 대부분의 기존 연구는 privileged state 정보에 의존하거나 특정 환경에 고착화된 인터페이스를 사용하여 일반적인 공간 지능을 측정하기 어렵습니다 .

#Review #Spatial Reasoning #Multimodal Agents #Interactive Benchmark #Egocentric Vision #POMDP #Spatial Intelligence

2026년 6월 8일

[논문리뷰] Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

본 논문은 Spatial Intelligence를 구축하는 데 있어 VLM과 VGM 중 어느 사전 학습(Pre-training) 패러다임이 더 우수한 표현 체계(Representation substrate)를 제공하는지 분석한다 .

#Review #Spatial Intelligence #Vision-Language Models #Video Generation Models #Frozen-Feature Probing #Representation Learning #Semantic Tagging #3D Geometry Prediction

2026년 6월 1일

[논문리뷰] Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

본 논문은 Foundation Models가 수동적인 시각적 이해를 넘어, 능동적인 탐색을 통해 3D 공간에서 목표 시점을 정확히 재현할 수 있는지 질문합니다 . 기존 연구들은 주로 사전에 수집된 데이터에 의존하여 '무엇이 어디에 있는가'를 묻는 정적인 공간 지능에 집중해 왔습니다.

#Review #Target Viewpoint Reproduction #TVRBench #Active Exploration #Foundation Models #Spatial Intelligence #Embodied AI #GRPO #SFT

2026년 6월 1일

[논문리뷰] From Pixels to Words -- Towards Native One-Vision Models at Scale

본 논문은 기존의 modular VLM이 가진 복잡한 파이프라인과 파편화된 visual-language 정보를 해결하기 위해 단일화된 Native one-vision 아키텍처를 제안한다.

#Review #Native Vision-Language Models #Monolithic Backbone #Spatiotemporal Attention #One-Vision Foundation Model #End-to-End Learning #Spatial Intelligence

2026년 5월 27일

[논문리뷰] SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

본 연구는 기존 MLLM의 공간 지능 벤치마크가 대부분 깨끗하고 이상적인 환경(Pristine visual inputs)만을 가정하여, 실제 환경에서 발생하는 다양한 시각적 퇴화를 간과하고 있다는 문제를 해결한다.

#Review #Multimodal Large Language Models #Spatial Intelligence #Visual Degradation #3D Gaussian Splatting #Robustness #Benchmarking #Degradation-aware Training

2026년 5월 21일

[논문리뷰] Context Unrolling in Omni Models

본 논문은 다양한 모달리티를 원천 학습하여 모델이 스스로 추론 경로를 구조화하도록 유도하는 Context Unrolling 프레임워크를 제안한다. 모델은 작업 관련 컨텍스트를 선택적으로 활성화하여 공유 작업 공간에 투입하며, 이는 최종 예측 전후로 긴밀하게 작동한다 .

#Review #Multimodal Foundation Model #Context Unrolling #Unified Architecture #Cross-modal Reasoning #Spatial Intelligence #Mixture-of-Experts

2026년 4월 23일

[논문리뷰] OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

본 논문은 기존 MLLM이 언어적 능력에 비해 공간 이해 능력(거리 측정, 다중 뷰 일관성 등)이 현저히 떨어지는 'Spatial Myopia' 문제를 해결하고자 한다. 기존 연구들은 고정된 데이터셋만을 배포할 뿐, 공간 데이터를 생성하는 엔진 자체를 비공개로 운영하여 데이터의 확장성과 재현성을 저해하고 있다.

#Review #Spatial Intelligence #Data Engine #3D Bounding Boxes #Multimodal Large Language Models #Data Scaling #3D Lifting

2026년 4월 9일

[논문리뷰] Learn2Fold: Structured Origami Generation with World Model Planning

Origami는 평면 시트를 복잡한 3D 구조로 변환하는 물리적 지능의 고난도 테스트베드입니다. 이는 단순한 시각적 플라시보가 아니라 기하학적 공리와 엄격한 Kinematic 제약 조건을 준수해야 하며, 작은 오류가 전체 구조의 붕괴를 초래하는 장기적인 추론 작업입니다.

#Review #Origami Generation #Neuro-symbolic Framework #World Model #Constraint-Aware Planning #Program Induction #Spatial Intelligence

2026년 3월 31일

[논문리뷰] Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

인간은 시각적 관찰 스트림을 통해 실제 공간을 인지하고 이해하므로, 잠재적으로 무한한 비디오 스트림에서 Spatial Evidence 를 스트리밍 방식으로 유지하고 업데이트하는 능력은 Spatial Intelligence 에 필수적입니다.

#Review #Spatial Intelligence #Test-Time Training #MLLM #Streaming Video #Hybrid Architecture #Spatiotemporal Convolution

2026년 3월 12일

[논문리뷰] Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

본 논문은 Vision-Language Model (VLM) 의 공간 지능을 스포츠 시나리오에서 벤치마킹하고 발전시키는 것을 목표로 합니다.

#Review #Spatial Intelligence #Vision-Language Models #Sports Analytics #3D Reconstruction #Dataset #Benchmark #Racket Sports #Human-Centric AI

2026년 3월 10일

[논문리뷰] Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

현재 Text-to-Image (T2I) 모델들이 복잡한 공간 관계(공간 인식, 추론, 상호작용) 처리에서 실패하는 한계를 해결하고, 기존의 짧고 정보 밀도가 낮은 프롬프트 기반 벤치마크의 부적합성을 극복하는 것을 목표로 합니다.

#Review #Text-to-Image Models #Spatial Intelligence #Benchmark #Evaluation #Prompt Engineering #Multimodal LLMs #Fine-tuning #Spatial Reasoning

2026년 1월 29일

[논문리뷰] Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

본 논문은 자율 시스템을 위한 진정한 공간 지능(Spatial Intelligence) 을 구축하기 위해 다중 모달(multi-modal) 온보드 센서 데이터 사전 훈련에 대한 포괄적인 로드맵을 제시합니다.

#Review #Multi-modal Pre-training #Autonomous Systems #Spatial Intelligence #Foundation Models #LiDAR-Camera Fusion #Self-Supervised Learning #Generative World Models #Embodied AI

2025년 12월 31일

[논문리뷰] SpatialTree: How Spatial Abilities Branch Out in MLLMs

멀티모달 대규모 언어 모델(MLLM) 내에서 공간 능력의 계층적 구조가 제대로 이해되지 않고 단편적으로 연구되는 문제를 해결하는 것을 목표로 합니다.

#Review #Spatial Intelligence #Multimodal LLMs #Cognitive Hierarchy #Benchmark #Reinforcement Learning #Supervised Fine-tuning #Spatial Reasoning

2025년 12월 23일

[논문리뷰] Scaling Spatial Intelligence with Multimodal Foundation Models

본 연구는 최신 멀티모달 파운데이션 모델(Multimodal Foundation Models, MLLMs)이 가진 공간 지능(spatial intelligence)의 부족함을 해결하고, SenseNova-SI 계열 모델을 통해 대규모 데이터 스케일링을 통해 공간 지능을 효과적으로 육성하는 방법을 탐구하는 것을 목표로 합니다.

#Review #Spatial Intelligence #Multimodal Foundation Models #Data Scaling #Perspective-taking #Visual Question Answering #Emergent Capabilities #Embodied AI #Benchmark Evaluation

2025년 11월 20일

[논문리뷰] Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

이 연구는 최신 MLLM(Multi-modal Large Language Model) , 특히 GPT-5 가 인공 일반 지능(AGI)의 핵심 역량인 공간 이해 및 추론 능력을 얼마나 달성했는지 실증적으로 평가하는 것을 목표로 합니다.

#Review #Spatial Intelligence #Multimodal LLMs #Benchmark Evaluation #GPT-5 #Cognitive AI #AGI

2025년 8월 19일