#Embodied Intelligence

9개의 포스트

[논문리뷰] Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

본 논문은 기존의 embodied AI 모델들이 특정 작업이나 로봇 플랫폼에만 고도화되어 있어 발생하는 파편화(fragmentation) 문제를 해결하기 위해 통합 모델을 제안합니다. 현재의 방식은 데이터 활용도가 낮고 일반화 성능이 제한적이라는 한계가 있습니다.

#Review #Embodied Intelligence #Vision-Language-Action Models #Flow-matching #Multi-task Learning #Cross-embodiment #Reinforcement Learning

2026년 5월 28일

[논문리뷰] GEM: Generative Supervision Helps Embodied Intelligence

본 논문은 현재의 Embodied VLM들이 고수준의 언어적 추론에는 능숙하지만, 실제 물리 환경에서 로봇을 제어하기 위한 미세한 공간적 구조와 물리적 인지 능력이 결합되지 못하는 한계를 해결하고자 합니다.

#Review #Embodied Intelligence #Vision-Language Models #Generative Supervision #Depth Map Prediction #Diffusion Transformer #Robot Manipulation #Spatiotemporal Planning

2026년 5월 27일

[논문리뷰] PhysBrain 1.0 Technical Report

본 논문은 기존 VLA 시스템이 의존하는 플랫폼 종속적인 로봇 궤적(Trajectory) 데이터 수집의 한계를 극복하고, 물리적 환경에 대한 근본적인 이해(Physical Commonsense)를 확보하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied Intelligence #Physical Commonsense #Egocentric Video #Data Engine #VLA Adaptation

2026년 5월 17일

[논문리뷰] A Benchmark for Interactive World Models with a Unified Action Generation Framework

본 논문은 대규모 데이터셋과 통합된 벤치마크의 부재로 인해 interactive world model의 물리적 상호작용 능력을 객관적으로 평가하기 어렵다는 문제를 해결하고자 합니다.

#Review #Interactive World Models #Benchmark #Action Generation Framework #Embodied Intelligence #Trajectory Following #Memory Ability

2026년 5월 5일

[논문리뷰] Experience Transfer for Multimodal LLM Agents in Minecraft Game

본 논문은 Echo 프레임워크를 통해 환경 지식을 5가지 전이 차원으로 분해하고 CSD를 통해 이를 통일된 의미론적 형태로 변환하여 관리합니다 . CSD는 시각적 및 텍스트 정보를 벡터화된 임베딩과 결합하여 메모리 뱅크에 저장하며, 이를 통해 ICAL 알고리즘이 관련 경험을 정밀하게 검색할 수 있도록 지원합니다 .

#Review #Multimodal LLM Agent #Experience Transfer #In-Context Analogy Learning (ICAL)#Minecraft #Contextual State Descriptor (CSD)#Embodied Intelligence

2026년 4월 7일

[논문리뷰] CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

최근 저고도 경제, Embodied Intelligence , 그리고 공중-지상 협력 시스템의 발전으로 인해 지상과 항공 에이전트를 결합하여 시뮬레이션할 수 있는 인프라의 필요성이 급증하고 있습니다.

#Review #Embodied Intelligence #Simulation Infrastructure #CARLA #AirSim #Air-Ground Cooperation #Unreal Engine

2026년 3월 31일

[논문리뷰] StreamingClaw Technical Report

Embodied Intelligence, AI Hardware, Autonomous Driving, Intelligent Cockpits와 같은 Applications은 Real-time Perception–Decision–Action Closed Loop에 크게 의존하며, 이는 Real-time Streaming Video Understanding에 대한 엄격한 요구사항을 부과한다.

#Review #Streaming Video Understanding #Embodied Intelligence #Multi-agent Systems #Long-term Memory #Proactive Interaction #Real-time Inference #OpenClaw

2026년 3월 25일

[논문리뷰] Chain of World: World Model Thinking in Latent Motion

기존 VLA(Vision-Language-Action) 모델이 예측 능력 부족과 시각적 중복성 재구성에 따른 비효율성을 보이는 한계를 극복하고, 잠재 액션 모델의 연속적인 동적 모델링 및 세계 지식 부족 문제를 해결하고자 합니다.

#Review #Vision-Language-Action Models #World Models #Latent Motion #Embodied Intelligence #Temporal Reasoning #Disentangled Representation #Robotics #Pretraining

2026년 3월 3일

[논문리뷰] An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

본 논문은 급변하는 Vision-Language-Action (VLA) 모델 분야에 대한 명확하고 구조화된 가이드를 제공하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied Intelligence #Robotics #Foundation Models #Multi-modal Learning #Reinforcement Learning #Sim-to-Real Transfer #Human-Robot Interaction

2025년 12월 21일