본문으로 건너뛰기

#Embodied AI

95개의 포스트

[논문리뷰] Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

댓글 수 로딩 중

[논문리뷰] Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

댓글 수 로딩 중

[논문리뷰] RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

댓글 수 로딩 중

[논문리뷰] MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

댓글 수 로딩 중

[논문리뷰] Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

댓글 수 로딩 중

[논문리뷰] SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

댓글 수 로딩 중

[논문리뷰] StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

댓글 수 로딩 중

[논문리뷰] Audio-Visual Intelligence in Large Foundation Models

댓글 수 로딩 중

[논문리뷰] Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

댓글 수 로딩 중

[논문리뷰] PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models

댓글 수 로딩 중

[논문리뷰] π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

댓글 수 로딩 중

[논문리뷰] EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

댓글 수 로딩 중

[논문리뷰] Solaris: Building a Multiplayer Video World Model in Minecraft

댓글 수 로딩 중

[논문리뷰] From Perception to Action: An Interactive Benchmark for Vision Reasoning

댓글 수 로딩 중

[논문리뷰] BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models

댓글 수 로딩 중

[논문리뷰] ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

댓글 수 로딩 중

[논문리뷰] SAGE: Scalable Agentic 3D Scene Generation for Embodied AI

댓글 수 로딩 중

[논문리뷰] BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation

댓글 수 로딩 중

[논문리뷰] Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

댓글 수 로딩 중

[논문리뷰] PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction

댓글 수 로딩 중

[논문리뷰] Advancing Open-source World Models

댓글 수 로딩 중

[논문리뷰] TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

댓글 수 로딩 중

[논문리뷰] RoboBrain 2.5: Depth in Sight, Time in Mind

댓글 수 로딩 중

[논문리뷰] Rethinking Video Generation Model for the Embodied World

댓글 수 로딩 중

[논문리뷰] FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

댓글 수 로딩 중

[논문리뷰] Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

댓글 수 로딩 중

[논문리뷰] Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

댓글 수 로딩 중

[논문리뷰] VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

댓글 수 로딩 중

[논문리뷰] Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

댓글 수 로딩 중

[논문리뷰] Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

댓글 수 로딩 중

[논문리뷰] Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

댓글 수 로딩 중

[논문리뷰] SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization

댓글 수 로딩 중

[논문리뷰] EgoLCD: Egocentric Video Generation with Long Context Diffusion

댓글 수 로딩 중

[논문리뷰] Scaling Spatial Intelligence with Multimodal Foundation Models

댓글 수 로딩 중

[논문리뷰] MiMo-Embodied: X-Embodied Foundation Model Technical Report

댓글 수 로딩 중

[논문리뷰] 10 Open Challenges Steering the Future of Vision-Language-Action Models

댓글 수 로딩 중

[논문리뷰] RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

댓글 수 로딩 중

[논문리뷰] WoW: Towards a World omniscient World model Through Embodied Interaction

댓글 수 로딩 중

[논문리뷰] SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

댓글 수 로딩 중

[논문리뷰] PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era

댓글 수 로딩 중

[논문리뷰] OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

댓글 수 로딩 중

[논문리뷰] Robix: A Unified Model for Robot Interaction, Reasoning and Planning

댓글 수 로딩 중

[논문리뷰] EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

댓글 수 로딩 중

[논문리뷰] Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

댓글 수 로딩 중

[논문리뷰] OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

댓글 수 로딩 중

[논문리뷰] Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

댓글 수 로딩 중

[논문리뷰] Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success

댓글 수 로딩 중

[논문리뷰] RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

댓글 수 로딩 중

[논문리뷰] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

댓글 수 로딩 중

[논문리뷰] Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

댓글 수 로딩 중

[논문리뷰] VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

댓글 수 로딩 중

[논문리뷰] PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

댓글 수 로딩 중

[논문리뷰] D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

댓글 수 로딩 중

[논문리뷰] Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

댓글 수 로딩 중

[논문리뷰] OceanGym: A Benchmark Environment for Underwater Embodied Agents

댓글 수 로딩 중