[논문리뷰] The Latent Space: Foundation, Evolution, Mechanism, Ability, and OutlookYongbo He이 arXiv에 게시한 'The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook' 논문에 대한 자세한 리뷰입니다.#Review#Latent Space#Language-based Models#Implicit Reasoning#Multimodal Computation#Embodied AI#Latent Representation#Machine-native2026년 4월 2일댓글 수 로딩 중
[논문리뷰] Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object CaptioningarXiv에 게시된 'Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Vision-Language Models#Episodic Memory#Semantic Consistency#Object Captioning#Data Association2026년 4월 2일댓글 수 로딩 중
[논문리뷰] Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied SimulationTianqi Liu이 arXiv에 게시한 'Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#4D Generative World Model#Spatiotemporal-aware#Kinematic Control#Robotic Simulation#Diffusion Transformer#Pointmap2026년 3월 17일댓글 수 로딩 중
[논문리뷰] MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied AgentsarXiv에 게시된 'MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents' 논문에 대한 자세한 리뷰입니다.#Review#Egocentric Vision#Multi-Agent Systems#Video Question Answering#Long-Horizon Reasoning#Embodied AI#Benchmark Dataset#Shared Memory#Dynamic Retrieval2026년 3월 11일댓글 수 로딩 중
[논문리뷰] π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAsarXiv에 게시된 'π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#Flow-based Models#Vision-Language-Action (VLA) Models#Online Learning#Stochastic Differential Equation (SDE)#Contrastive Learning#Embodied AI#Robotics2026년 3월 8일댓글 수 로딩 중
[논문리뷰] EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene UnderstandingGim Hee Lee이 arXiv에 게시한 'EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding' 논문에 대한 자세한 리뷰입니다.#Review#3D Gaussian Splatting#Open-Vocabulary#Embodied AI#Online Reconstruction#Semantic 3D Scene Understanding#CLIP Features#Feed-Forward Neural Networks2026년 3월 4일댓글 수 로딩 중
[논문리뷰] EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied AgentsXuqian Ren이 arXiv에 게시한 'EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#4D Reconstruction#Human-Scene Interaction#iPhone RGB-D#In-the-Wild Mocap#Physics-based Animation#Humanoid Robot Control#Low-Cost Data Collection2026년 2월 26일댓글 수 로딩 중
[논문리뷰] Solaris: Building a Multiplayer Video World Model in MinecraftTimothy Meehan이 arXiv에 게시한 'Solaris: Building a Multiplayer Video World Model in Minecraft' 논문에 대한 자세한 리뷰입니다.#Review#Multi-agent World Models#Video Diffusion Models#Minecraft#Self Forcing#Checkpointed Self Forcing#Multi-view Consistency#Data Collection#Embodied AI2026년 2월 25일댓글 수 로딩 중
[논문리뷰] From Perception to Action: An Interactive Benchmark for Vision ReasoningZhiqiang Hu이 arXiv에 게시한 'From Perception to Action: An Interactive Benchmark for Vision Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Physical Reasoning#Interactive AI#3D Benchmark#Mechanical Puzzles#Spatial Packing#Embodied AI2026년 2월 24일댓글 수 로딩 중
[논문리뷰] BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language ModelsarXiv에 게시된 'BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Bimanual Manipulation#MLLMs#Robotics Benchmark#Spatial Reasoning#Action Planning#End-Effector Control#Embodied AI#Multimodal LLMs2026년 2월 18일댓글 수 로딩 중
[논문리뷰] ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold LearningarXiv에 게시된 'ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning' 논문에 대한 자세한 리뷰입니다.#Review#Robotic Manipulation#Vision-Language-Action (VLA)#Foundation Models#Action Manifold Learning#Diffusion Transformers#Data Curation#Embodied AI2026년 2월 15일댓글 수 로딩 중
[논문리뷰] Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language NavigationYukuan Xu이 arXiv에 게시한 'Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Navigation#Beyond-the-View Navigation#Video Generation Models#Sparse Video Generation#Diffusion Models#Embodied AI#Real-world Navigation#Long-horizon Planning2026년 2월 12일댓글 수 로딩 중
[논문리뷰] PhyCritic: Multimodal Critic Models for Physical AIarXiv에 게시된 'PhyCritic: Multimodal Critic Models for Physical AI' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Critics#Physical AI#Reinforcement Learning#Self-Referential Finetuning#Evaluation Models#Causal Reasoning#Embodied AI#RLVR2026년 2월 11일댓글 수 로딩 중
[논문리뷰] SAGE: Scalable Agentic 3D Scene Generation for Embodied AIarXiv에 게시된 'SAGE: Scalable Agentic 3D Scene Generation for Embodied AI' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#3D Scene Generation#Agentic Framework#Simulation-Ready Environments#Robot Policy Learning#Large Language Models (LLM)#Physics Simulation#Data Augmentation2026년 2월 10일댓글 수 로딩 중
[논문리뷰] BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action GenerationXiaoyu Chen이 arXiv에 게시한 'BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation' 논문에 대한 자세한 리뷰입니다.#Review#Long-horizon manipulation#Embodied AI#Vision-Language-Action (VLA)#Interleaved planning#Visual forecasting#Residual Flow Guidance#Multimodal learning2026년 2월 10일댓글 수 로딩 중
[논문리뷰] Research on World Models Is Not Merely Injecting World Knowledge into Specific TasksarXiv에 게시된 'Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks' 논문에 대한 자세한 리뷰입니다.#Review#World Models#Unified Framework#Multimodal AI#Embodied AI#Physical Understanding#Long-term Consistency#AI Agents#Generative Models2026년 2월 3일댓글 수 로딩 중
[논문리뷰] PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D ReconstructionarXiv에 게시된 'PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction' 논문에 대한 자세한 리뷰입니다.#Review#Streaming 3D Reconstruction#Hybrid Representation#Triangle Primitives#Neural Gaussians#Geometric Accuracy#High-Fidelity Rendering#Embodied AI#Monocular SLAM2026년 1월 29일댓글 수 로딩 중
[논문리뷰] Advancing Open-source World ModelsarXiv에 게시된 'Advancing Open-source World Models' 논문에 대한 자세한 리뷰입니다.#Review#World Models#Open-source AI#Video Generation#Real-time Simulation#Long-term Memory#Action-Conditioned Learning#Generative Models#Embodied AI2026년 1월 28일댓글 수 로딩 중
[논문리뷰] TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-TransformersarXiv에 게시된 'TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#Embodied AI#Robotics#Catastrophic Forgetting#Asymmetric Mixture-of-Transformers (AsyMoT)#Generalist VLM#Specialist VLM#Flow-Matching2026년 1월 25일댓글 수 로딩 중
[논문리뷰] RoboBrain 2.5: Depth in Sight, Time in MindYuheng Ji이 arXiv에 게시한 'RoboBrain 2.5: Depth in Sight, Time in Mind' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Foundation Model#3D Spatial Reasoning#Temporal Value Estimation#Robotics#Manipulation#Multimodal Learning2026년 1월 21일댓글 수 로딩 중
[논문리뷰] Rethinking Video Generation Model for the Embodied WorldarXiv에 게시된 'Rethinking Video Generation Model for the Embodied World' 논문에 대한 자세한 리뷰입니다.#Review#Video Generation#Embodied AI#Robotics Benchmark#RBench#Robotics Dataset#RoVid-X#Physical Plausibility#Task Completion2026년 1월 21일댓글 수 로딩 중
[논문리뷰] FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language NavigationarXiv에 게시된 'FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Navigation#Chain-of-Thought Reasoning#Multimodal AI#Implicit Reasoning#Visual AutoRegressor#Embodied AI#Long-Horizon Planning2026년 1월 20일댓글 수 로딩 중
[논문리뷰] Aligning Agentic World Models via Knowledgeable Experience LearningarXiv에 게시된 'Aligning Agentic World Models via Knowledgeable Experience Learning' 논문에 대한 자세한 리뷰입니다.#Review#Agentic AI#World Models#Experience Learning#LLMs#Physical Hallucinations#Embodied AI#Predictive Coding#Knowledge Repository2026년 1월 20일댓글 수 로딩 중
[논문리뷰] Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent PlanningarXiv에 게시된 'Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action#Embodied AI#Latent Planning#Chain-of-Thought#Distillation#Inference Efficiency#Robotic Manipulation#Preference Learning2026년 1월 14일댓글 수 로딩 중
[논문리뷰] NitroGen: An Open Foundation Model for Generalist Gaming AgentsarXiv에 게시된 'NitroGen: An Open Foundation Model for Generalist Gaming Agents' 논문에 대한 자세한 리뷰입니다.#Review#Generalist Agents#Foundation Models#Behavior Cloning#Video Games#Action Extraction#Multi-game#Embodied AI2026년 1월 6일댓글 수 로딩 중
[논문리뷰] Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous SystemsarXiv에 게시된 'Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems' 논문에 대한 자세한 리뷰입니다.#Review#Multi-modal Pre-training#Autonomous Systems#Spatial Intelligence#Foundation Models#LiDAR-Camera Fusion#Self-Supervised Learning#Generative World Models#Embodied AI2025년 12월 31일댓글 수 로딩 중
[논문리뷰] VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active DialogsXihui Liu이 arXiv에 게시한 'VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Vision and Language Navigation#Instance Object Navigation#Active Dialog#Large Language Models (LLMs)#Benchmark#Human-Robot Interaction2025년 12월 29일댓글 수 로딩 중
[논문리뷰] QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language ModelsarXiv에 게시된 'QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Physical Reasoning#Quantitative Benchmark#Kinematics#Mean Relative Accuracy#Video-Text#Embodied AI2025년 12월 23일댓글 수 로딩 중
[논문리뷰] PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical IntelligencearXiv에 게시된 'PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence' 논문에 대한 자세한 리뷰입니다.#Review#Egocentric Data#Physical Intelligence#VLM#Robot Control#Embodied AI#VQA Supervision#Human-Robot Interaction#Zero-shot Transfer2025년 12월 21일댓글 수 로딩 중
[논문리뷰] Toward Ambulatory Vision: Learning Visually-Grounded Active View SelectionarXiv에 게시된 'Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection' 논문에 대한 자세한 리뷰입니다.#Review#Active Perception#Vision-Language Models (VLMs)#Embodied AI#View Selection#Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Visual Question Answering (VQA)#3D Environments2025년 12월 15일댓글 수 로딩 중
[논문리뷰] Openpi Comet: Competition Solution For 2025 BEHAVIOR ChallengeJinwei Gu이 arXiv에 게시한 'Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Long-horizon Tasks#Vision-Language-Action Models (VLA)#BEHAVIOR Challenge#Offline RL#Pre-training#Rejection Sampling Fine-Tuning (RFT)#Robotics2025년 12월 15일댓글 수 로딩 중
[논문리뷰] Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR ChallengeAkash Karnatak이 arXiv에 게시한 'Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA) models#Flow Matching#Embodied AI#Robot Manipulation#BEHAVIOR Challenge#Correlated Noise#Stage Tracking#Multi-Task Learning2025년 12월 14일댓글 수 로딩 중
[논문리뷰] LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied OperatorarXiv에 게시된 'LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator' 논문에 대한 자세한 리뷰입니다.#Review#Robotic Agent#Large Language Models (LLMs)#Embodied AI#Task Planning#Human-Robot Interaction#General-purpose Robotics#ROS2025년 12월 14일댓글 수 로딩 중
[논문리뷰] SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy OptimizationarXiv에 게시된 'SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Navigation#Large Vision-Language Models#Visual Prompt#Reinforcement Fine-Tuning#Policy Optimization#Embodied AI#Spatial Reasoning#Perception Errors2025년 12월 4일댓글 수 로딩 중
[논문리뷰] SIMA 2: A Generalist Embodied Agent for Virtual WorldsarXiv에 게시된 'SIMA 2: A Generalist Embodied Agent for Virtual Worlds' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Generalist Agent#Virtual Worlds#Foundation Models#Gemini#Self-Improvement#Dialogue#Reasoning#Reinforcement Learning2025년 12월 4일댓글 수 로딩 중
[논문리뷰] EgoLCD: Egocentric Video Generation with Long Context DiffusionarXiv에 게시된 'EgoLCD: Egocentric Video Generation with Long Context Diffusion' 논문에 대한 자세한 리뷰입니다.#Review#Egocentric Video Generation#Long-Context Diffusion#Long-Short Memory#Sparse KV Cache#Memory Regulation Loss#Structured Narrative Prompting#World Models#Embodied AI2025년 12월 4일댓글 수 로딩 중
[논문리뷰] 4DLangVGGT: 4D Language-Visual Geometry Grounded TransformerarXiv에 게시된 '4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer' 논문에 대한 자세한 리뷰입니다.#Review#4D Scene Understanding#Language Grounding#Transformer#Feed-forward Network#Semantic Field#Geometry Reconstruction#Embodied AI2025년 12월 4일댓글 수 로딩 중
[논문리뷰] SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RLarXiv에 게시된 'SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL' 논문에 대한 자세한 리뷰입니다.#Review#Spatial Reasoning#Vision Language Models#Reinforcement Learning#Tool Augmentation#Robotics#Multi-Tool Use#Embodied AI2025년 12월 3일댓글 수 로딩 중
[논문리뷰] MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial MemoryarXiv에 게시된 'MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory' 논문에 대한 자세한 리뷰입니다.#Review#Visual Navigation#Dual-Scale Framework#Sparse Spatial Memory Graph#Memory-Guided Planning#Geometry-Enhanced Control#Zero-Shot Navigation#Embodied AI2025년 12월 2일댓글 수 로딩 중
[논문리뷰] DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and ActionZhuoyang Liu이 arXiv에 게시한 'DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#Embodied AI#Action Degeneration#Data Pruning#Knowledge Distillation#Multi-modal Reasoning#Robot Learning#VLA Score2025년 11월 30일댓글 수 로딩 중
[논문리뷰] MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile RobotsRui Yang이 arXiv에 게시한 'MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#Mobile Robotics#Quadruped Robots#Chain-of-Thought (CoT)#Reinforcement Learning (RL)#Embodied AI#Multimodal Perception2025년 11월 26일댓글 수 로딩 중
[논문리뷰] GigaWorld-0: World Models as Data Engine to Empower Embodied AIChaojun Ni이 arXiv에 게시한 'GigaWorld-0: World Models as Data Engine to Empower Embodied AI' 논문에 대한 자세한 리뷰입니다.#Review#World Models#Embodied AI#Data Generation#Video Generation#3D Scene Reconstruction#Robotics#Vision-Language-Action2025년 11월 25일댓글 수 로딩 중
[논문리뷰] Scaling Spatial Intelligence with Multimodal Foundation ModelsarXiv에 게시된 'Scaling Spatial Intelligence with Multimodal Foundation Models' 논문에 대한 자세한 리뷰입니다.#Review#Spatial Intelligence#Multimodal Foundation Models#Data Scaling#Perspective-taking#Visual Question Answering#Emergent Capabilities#Embodied AI#Benchmark Evaluation2025년 11월 20일댓글 수 로딩 중
[논문리뷰] MiMo-Embodied: X-Embodied Foundation Model Technical ReportarXiv에 게시된 'MiMo-Embodied: X-Embodied Foundation Model Technical Report' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Model (VLM)#Embodied AI#Autonomous Driving#Foundation Model#Multimodal Learning#Task Planning#Affordance Prediction#Spatial Understanding#Reinforcement Learning2025년 11월 20일댓글 수 로딩 중
[논문리뷰] FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AIXinyu Yin이 arXiv에 게시한 'FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Vision-and-Language Navigation (VLN)#LLM-driven Simulation#Human-Agent Interaction#Closed-Loop#Benchmark Dataset#Social Cognition2025년 11월 19일댓글 수 로딩 중
[논문리뷰] NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference RewardsarXiv에 게시된 'NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action Model#Direct Preference Optimization#World Model#Reward Learning#Robotics#Embodied AI#Flow-Matching2025년 11월 17일댓글 수 로딩 중
[논문리뷰] 10 Open Challenges Steering the Future of Vision-Language-Action ModelsarXiv에 게시된 '10 Open Challenges Steering the Future of Vision-Language-Action Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action Models#Embodied AI#Robotics#Multimodal Perception#Cross-Robot Generalization#Hierarchical Planning#World Models#Robot Safety2025년 11월 10일댓글 수 로딩 중
[논문리뷰] LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool AugmentationSoohyun Oh이 arXiv에 게시한 'LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation' 논문에 대한 자세한 리뷰입니다.#Review#3D Scene Synthesis#Fine-Grained Evaluation#Tool-Augmented LLMs#Embodied AI#Vision-Language Models#Benchmark#Multi-Hop Grounding2025년 11월 9일댓글 수 로딩 중
[논문리뷰] RoboChallenge: Large-scale Real-robot Evaluation of Embodied PoliciesarXiv에 게시된 'RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies' 논문에 대한 자세한 리뷰입니다.#Review#Robotics#Real-robot Evaluation#Embodied AI#Vision-Language-Action Models#Benchmarking#Online Testing System#Robotics Control#Large-scale Evaluation2025년 11월 9일댓글 수 로딩 중
[논문리뷰] Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion ProcessarXiv에 게시된 'Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#Diffusion Models#Discrete Denoising#Multimodal Learning#Robotics#Embodied AI#Joint Generation#Action Prediction2025년 11월 9일댓글 수 로딩 중
[논문리뷰] A Survey on Efficient Vision-Language-Action ModelsarXiv에 게시된 'A Survey on Efficient Vision-Language-Action Models' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Robotic Manipulation#VLA Models#Efficient AI#Model Compression#Efficient Training#Data Collection#Multimodal AI2025년 11월 9일댓글 수 로딩 중
[논문리뷰] Multimodal Spatial Reasoning in the Large Model Era: A Survey and BenchmarksarXiv에 게시된 'Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Spatial Reasoning#Survey#Benchmarks#3D Vision#Embodied AI#Vision-Language Navigation2025년 10월 30일댓글 수 로딩 중
[논문리뷰] From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation PriorsarXiv에 게시된 'From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#3D Spatial Reasoning#Embodied AI#Foundation Models#Multimodal Fusion#Robot Manipulation#Modality Transferability#Action Grounding2025년 10월 29일댓글 수 로딩 중
[논문리뷰] VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and ActingHaihan Gao이 arXiv에 게시한 'VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Human-Robot Interaction#Vision-Language Models#Concurrency#Interruption#Robotics Control#Dual-Model Architecture#Special Tokens2025년 10월 28일댓글 수 로딩 중
[논문리뷰] PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical EnvironmentsChaoyang Zhao이 arXiv에 게시한 'PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments' 논문에 대한 자세한 리뷰입니다.#Review#Active Visual Reasoning#MLLM#Physical Environments#Partially Observable#Markov Decision Process#Chain-of-Thought#Embodied AI#CLEVR-AVR2025년 10월 27일댓글 수 로딩 중
[논문리뷰] Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D AssetsarXiv에 게시된 'Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets' 논문에 대한 자세한 리뷰입니다.#Review#3D Asset Generation#Simulation-Ready Assets#Diffusion Models#Physically Based Rendering (PBR)#Embodied AI#Robotic Simulation#Image-to-3D#Foundation Model2025년 10월 24일댓글 수 로딩 중
[논문리뷰] GigaBrain-0: A World Model-Powered Vision-Language-Action ModelarXiv에 게시된 'GigaBrain-0: A World Model-Powered Vision-Language-Action Model' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action Model#World Model#Data Augmentation#Robot Generalization#Embodied AI#RGBD#Chain-of-Thought2025년 10월 23일댓글 수 로딩 중
[논문리뷰] World-in-World: World Models in a Closed-Loop WorldArda Uzunoglu이 arXiv에 게시한 'World-in-World: World Models in a Closed-Loop World' 논문에 대한 자세한 리뷰입니다.#Review#World Models#Embodied AI#Closed-Loop Evaluation#Online Planning#Data Scaling#Controllability#Robotic Manipulation2025년 10월 22일댓글 수 로딩 중
[논문리뷰] ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement LearningarXiv에 게시된 'ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Vision Language Models (VLMs)#Reinforcement Learning (RL)#Prior Learning#Supervised Fine-tuning (SFT)#Embodied Agents2025년 10월 15일댓글 수 로딩 중
[논문리뷰] PhysToolBench: Benchmarking Physical Tool Understanding for MLLMsXu Zheng이 arXiv에 게시한 'PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models (MLLMs)#Physical Tool Understanding#Benchmarking#Embodied AI#Visual Question Answering (VQA)#Tool Affordances#Reasoning2025년 10월 13일댓글 수 로딩 중
[논문리뷰] D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AIHaebin Seong이 arXiv에 게시한 'D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Vision-Action Pretraining#Desktop Data#Inverse Dynamics Model (IDM)#Pseudo-labeling#Robotics#Generalization#Data Compression2025년 10월 13일댓글 수 로딩 중
[논문리뷰] VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World SimulatorsZirui Ge이 arXiv에 게시한 'VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action Models#Reinforcement Learning#World Models#Fine-tuning#Embodied AI#Robotics#Reward Design#Distribution Shift2025년 10월 2일댓글 수 로딩 중
[논문리뷰] OceanGym: A Benchmark Environment for Underwater Embodied AgentsarXiv에 게시된 'OceanGym: A Benchmark Environment for Underwater Embodied Agents' 논문에 대한 자세한 리뷰입니다.#Review#Underwater Robotics#Embodied AI#Benchmark Environment#Multi-modal Large Language Models#Autonomous Underwater Vehicles#Perception#Decision-Making#Simulation2025년 10월 1일댓글 수 로딩 중
[논문리뷰] WoW: Towards a World omniscient World model Through Embodied InteractionWeishi Mi이 arXiv에 게시한 'WoW: Towards a World omniscient World model Through Embodied Interaction' 논문에 대한 자세한 리뷰입니다.#Review#World Model#Embodied AI#Robotics#Diffusion Models#Physical Reasoning#Vision Language Models#Interaction Data#Self-Optimization2025년 9월 29일댓글 수 로딩 중
[논문리뷰] SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective AgentSiyuan Huang이 arXiv에 게시한 'SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent' 논문에 대한 자세한 리뷰입니다.#Review#3D Scene Synthesis#Agentic Framework#LLMs#Self-Reflection#Tool-Use#Physical Plausibility#Iterative Refinement#Embodied AI2025년 9월 26일댓글 수 로딩 중
[논문리뷰] Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn DialogueHui Zhang이 arXiv에 게시한 'Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Human-Robot Interaction#Multi-turn Dialogue#Instruction Following#Vision-Language Models#Diffusion Models#Ambiguity Resolution#Low-level Actions2025년 9월 22일댓글 수 로딩 중
[논문리뷰] PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI EraZihao Dongfang이 arXiv에 게시한 'PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era' 논문에 대한 자세한 리뷰입니다.#Review#Omnidirectional Vision#Embodied AI#Panoramic Perception#Multi-modal Learning#Dataset Development#Robot Navigation#Spatial Reasoning#System Architecture2025년 9월 18일댓글 수 로딩 중
[논문리뷰] InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic LayoutsWenzhe Cai이 arXiv에 게시한 'InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#3D Scene Dataset#Simulation Environment#Scene Generation#Point-Goal Navigation#Realistic Layouts#Object Interaction#Real-to-Sim2025년 9월 16일댓글 수 로딩 중
[논문리뷰] OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware ReasoningYuzheng Zhuang이 arXiv에 게시한 'OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Multimodal LLMs#3D Grounding#Task-Adaptive Reasoning#Embodiment-Aware Planning#Robotics#Spatial Reasoning2025년 9월 12일댓글 수 로딩 중
[논문리뷰] F1: A Vision-Language-Action Model Bridging Understanding and Generation to ActionsZherui Qiu이 arXiv에 게시한 'F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action#Embodied AI#Visual Foresight#Predictive Inverse Dynamics#Mixture-of-Transformer#Robot Manipulation#Multi-stage Training#Generalization2025년 9월 10일댓글 수 로딩 중
[논문리뷰] Robix: A Unified Model for Robot Interaction, Reasoning and PlanningZixuan Wang이 arXiv에 게시한 'Robix: A Unified Model for Robot Interaction, Reasoning and Planning' 논문에 대한 자세한 리뷰입니다.#Review#Robot Learning#Vision-Language Models (VLMs)#Embodied AI#Human-Robot Interaction (HRI)#Task Planning#Reinforcement Learning (RL)#Chain-of-Thought (CoT) Reasoning#Robotics2025년 9월 4일댓글 수 로딩 중
[논문리뷰] EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot ControlZhaoqing Chen이 arXiv에 게시한 'EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Robot Control#Vision-Language-Action Models#Multimodal Pretraining#Flow Matching#Foundation Models#Generalization#Real-world Robotics2025년 9월 1일댓글 수 로딩 중
[논문리뷰] RynnEC: Bringing MLLMs into Embodied Worldjiangpinliu이 arXiv에 게시한 'RynnEC: Bringing MLLMs into Embodied World' 논문에 대한 자세한 리뷰입니다.#Review#Multi-modal Large Language Models#Embodied AI#Embodied Cognition#Video Understanding#Instance Segmentation#Spatial Reasoning#Robotics2025년 8월 21일댓글 수 로딩 중
[논문리뷰] Embodied-R1: Reinforced Embodied Reasoning for General Robotic ManipulationFei Ni이 arXiv에 게시한 'Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Robotic Manipulation#Reinforcement Learning#Vision-Language Model#Pointing#Zero-shot Generalization2025년 8월 20일댓글 수 로딩 중
[논문리뷰] OmniEAR: Benchmarking Agent Reasoning in Embodied TasksHongxing Li이 arXiv에 게시한 'OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Agent Reasoning#LLM#Benchmarking#Tool Use#Multi-Agent Systems#Physical Interaction#Constraint Reasoning2025년 8월 12일댓글 수 로딩 중
[논문리뷰] Genie Envisioner: A Unified World Foundation Platform for Robotic ManipulationShengcong Chen이 arXiv에 게시한 'Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation' 논문에 대한 자세한 리뷰입니다.#Review#Robotic Manipulation#World Model#Video Generation#Diffusion Model#Embodied AI#Foundation Model#Robotics Simulation#Policy Learning2025년 8월 8일댓글 수 로딩 중
[논문리뷰] Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World SuccessRuslan Rakhimov이 arXiv에 게시한 'Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Vision-Language Models#Synthetic Worlds#Transfer Learning#PPO#Actor-Critic#Embodied AI2025년 8월 7일댓글 수 로딩 중
[논문리뷰] RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied SystemsJunkun Hong이 arXiv에 게시한 'RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems' 논문에 대한 자세한 리뷰입니다.#Review#Brain-inspired AI#Lifelong Learning#Embodied AI#Multi-memory Systems#Knowledge Graph#Robotics#Closed-Loop Planning2025년 8월 5일댓글 수 로딩 중
[논문리뷰] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal NavigationJianjiang Feng이 arXiv에 게시한 'IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation' 논문에 대한 자세한 리뷰입니다.#Review#Image-goal Navigation#3D Gaussian Splatting (3DGS)#Incremental Scene Representation#Coarse-to-fine Localization#Embodied AI#Robotics#Differentiable Rendering2025년 8월 4일댓글 수 로딩 중