#Robotic Manipulation

44개의 포스트

[논문리뷰] Dual Latent Memory in Vision-Language-Action Models for Robotic Manipulation

본 논문은 기존 VLA 모델들이 지닌 Markovian assumption으로 인한 temporal short-horizon bias를 해결하고자 합니다 .

#Review #Vision-Language-Action Models #Latent Memory #Robotic Manipulation #Long-horizon Tasks #Dual-scale Vault #Memory-augmented Reasoning

2026년 7월 8일

[논문리뷰] RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

기존의 로봇 조작을 위한 월드 모델들은 주로 2D 픽셀 기반의 비디오 생성에 의존하고 있어, 실제 로봇 시스템이 요구하는 정밀한 3D 공간 관계나 물리적 일관성을 확보하는 데 한계가 있습니다.

#Review #4D Embodied World Models #Robotic Manipulation #Generative Video Models #RGB-DF Representation #Flow Matching #Joint Cross-Modal Attention #Embodied AI

2026년 7월 7일

[논문리뷰] From Foundation to Application: Improving VLA Models in Practice

본 논문은 기존의 VLA foundation model들이 실험실 환경의 벤치마크에서는 뛰어난 성능을 보이지만, 실제 로봇 환경의 다양한 하드웨어 구성과 복잡한 작업 조건에서는 여전히 한계가 있다는 문제 의식에서 출발합니다.

#Review #Vision-Language-Action (VLA)#Mixture-of-Experts (MoE)#Embodiment Generalization #Dual-Query Distillation #Robotic Manipulation #Spatiotemporal Reasoning

2026년 7월 7일

[논문리뷰] Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

본 논문은 현재 VLA 학습이 겪고 있는 데이터 부족 문제를 해결하기 위해, 대규모 Expert Demonstration 의존성을 탈피하는 새로운 사전 학습 프레임워크를 제안합니다. 기존의 VLA 모델은 고가의 인간 조작 데이터에 과도하게 의존하며, 이는 데이터 수집의 확장성을 저해하는 근본적인 병목 현상으로 작용합니다.

#Review #Vision-Language-Action Models #Task-Agnostic Pretraining #Embodied AI #Inverse Dynamics #Physical Grounding #Robotic Manipulation

2026년 7월 2일

[논문리뷰] PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

본 논문은 최신 비디오 생성 모델들이 로봇 조작 과업에서 나타내는 물리적 불일치(Physical implausibility) 문제를 해결하고자 합니다.

#Review #Embodied Intelligence #Video Generation #World Models #Physics-aware #Robotic Manipulation #Hierarchical Alignment

2026년 6월 28일

[논문리뷰] World Value Models for Robotic Manipulation

본 연구는 기존 로봇 가치 모델이 정적인 이미지 기반의 VLM 백본에 의존하여 장기적인 시간적 맥락과 미래 결과를 이해하는 데 한계가 있다는 문제점에서 출발합니다.

#Review #World Models #Robotic Manipulation #Value Estimation #Flow Matching #Distributional Value #Suboptimal-Value-Bench

2026년 6월 23일

[논문리뷰] EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

본 논문은 기존의 VLA 모델이 엄격한 Markovian 가정 하에 작동하여 장기 작업(long-horizon) 중 발생하는 시각 정보의 차단이나 일시적 변화를 적절히 처리하지 못하는 문제를 해결합니다.

#Review #Vision-Language-Action Models #Robotic Manipulation #Long-Horizon #Memory-Augmented #Keyframe Evidence Memory #Non-Markovian

2026년 6월 23일

[논문리뷰] PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

기존의 World Foundation Models는 대부분 단일 뷰(Single-view) 기반으로 동작하거나, 다중 뷰를 단순히 시퀀스 차원에서 연결(Concatenation)하는 방식을 취하여 로봇 조작에 필수적인 3D 일관성 문제를 해결하지 못한다.

#Review #World Foundation Model #Robotic Manipulation #3D Consistency #Diffusion Transformer #Flow Matching #Multi-view Generation

2026년 6월 17일

[논문리뷰] WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

본 논문은 기존 월드 모델들이 복잡한 매니퓰레이션 태스크를 수행할 때 겪는 High Latency와 Context Length의 제한 문제를 해결하고자 한다.

#Review #World Model #Robotic Manipulation #Autoregressive Inference #Transformer #Efficiency #Generative Modeling

2026년 6월 11일

[논문리뷰] Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

본 논문은 비디오 생성 모델이 단순히 시각적으로 그럴듯한 영상을 만드는 수준을 넘어, 실제 물리 법칙을 내재화한 'World Model'로서의 기능을 수행하는지 검증하고자 합니다.

#Review #Video Generation Models #Robotic Manipulation #Physical Executability #Benchmark #Sim-to-Real #World Models

2026년 6월 4일

[논문리뷰] GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation

본 논문은 현대 로봇 학습에서 정책(Policy) 모델의 복잡도는 증가하는 반면, 이를 안정적으로 평가할 수 있는 시뮬레이션 환경이 병목 현상으로 작용하는 문제를 해결하고자 한다.

#Review #Robotic Manipulation #Video World Simulator #Action-Conditioned Generation #Closed-loop Evaluation #Proprioceptive State Expert #World Judge

2026년 5월 27일

[논문리뷰] Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

본 논문은 현대의 VLA 모델들이 Action Chunking 방식을 채택함에 따라 발생하는 Dynamics-Blindness 문제를 해결하는 데 집중한다 . 대부분의 VLA 모델은 고정된 단일 정적 프레임을 기반으로 미래 동작을 예측하기 때문에, 실행 과정에서 발생하는 환경 변화에 대응할 수 없다.

#Review #Vision-Language-Action Models #Action Chunking #Robotic Manipulation #Dynamic Environments #Inference-time Wrapper #Closed-form Optimization

2026년 5월 14일

[논문리뷰] RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

본 논문은 로봇 조작(Robotic Manipulation) 분야에서 작업에 최적화된 물리적 상호작용 데이터가 부족하다는 근본적인 문제를 해결하고자 합니다.

#Review #Robotic Manipulation #Vision-Language Models #Video Generation Models #Self-Evolving Framework #Complementary Learning Systems #Data Efficiency #Reinforcement Learning

2026년 5월 13일

[논문리뷰] Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

본 논문은 기존 Unified World Model들이 2D 픽셀 공간에만 국한되어 기하학적 구조에 대한 이해가 부족하며, 고차원 비디오 생성과 저차원 행동 예측 사이의 효율적인 균형을 맞추지 못한다는 문제를 해결하고자 한다.

#Review #Embodied AI #World Models #Diffusion Transformer #3D Reconstruction #Robotic Manipulation #Asynchronous Denoising #Unified Modeling

2026년 4월 29일

[논문리뷰] Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

본 논문은 로봇 제어 루프에 월드 모델을 통합하여 미래 예측과 평가 과정을 추가하는 Cortex 2.0 아키텍처를 제안합니다 . 제안 모델은 현재 관측치를 바탕으로 월드 모델을 통해 $k$개의 미래 궤적 후보를 잠재 공간(Visual latent space)에서 생성합니다.

#Review #Vision-Language-Action Models #World Models #Robotic Manipulation #Plan-and-Act #Process-Reward Operator #Flow Matching #Cross-Embodiment

2026년 4월 22일

[논문리뷰] LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

저자들은 로봇 조작 지시문의 핵심 구성 요소인 행동과 객체를 기반으로 43개의 정밀한 변형 유형을 포함하는 LIBERO-Para를 구축하였다. 또한, 모델의 성공 여부뿐만 아니라 원문 지시문과 파라프레이즈 간의 키워드 유사도(SKS_K)와 구조적 유사도(STS_T)를 결합한 PRIDE 메트릭을 제안하여 보다 해석 가능한 견고성 평가를 수행한다 .

#Review #Vision-Language-Action (VLA) Models #Paraphrase Robustness #Robotic Manipulation #Diagnostic Benchmark #PRIDE Metric #Object Grounding #Trajectory Divergence

2026년 4월 6일

[논문리뷰] MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

기존 VLA 모델들은 hierarchical 구조나 autoregressive 패러다임에 의존함으로써 발생하는 아키텍처 오버헤드, 장기적 시간 일관성 결여, 그리고 환경 역학(environment dynamics)을 파악하는 명시적 메커니즘 부족이라는 한계에 직면해 있습니다.

#Review #Vision-Language-Action (VLA)#Discrete Diffusion #Multi-modal Generation #Robotic Manipulation #Action Chunking #World Model #Hybrid Attention

2026년 4월 1일

[논문리뷰] Demystifying Action Space Design for Robotic Manipulation Policies

로봇 조작 정책 학습에서 액션 공간 설계가 주로 경험적 휴리스틱에 의해 이루어져 최적화 및 안정성에 대한 체계적인 이해가 부족한 문제를 해결하는 것입니다. 이 연구는 시간적(절대값 vs. 델타) 및 공간적(조인트-공간 vs.

#Review #Robotic Manipulation #Action Space Design #Imitation Learning #Delta Actions #Joint Space Control #Task Space Control #Generalization #Control Stability

2026년 3월 8일

[논문리뷰] SimVLA: A Simple VLA Baseline for Robotic Manipulation

본 논문은 급변하는 VLA 연구 분야에서 성능 향상의 정확한 원인을 파악하기 어려운 문제를 해결하기 위해, 간소화된 VLA 베이스라인 SimVLA 를 제안합니다.

#Review #Robotic Manipulation #Vision-Language-Action (VLA) Models #Baseline Model #Modular Design #Flow Matching #Zero-Shot Generalization #Standardized Training #Efficiency

2026년 2월 23일

[논문리뷰] RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

본 논문은 Vision-Language-Action (VLA) 모델 훈련 시, 시뮬레이션을 정적 데이터 소스로만 활용하고 폐쇄 루프 인터랙션을 충분히 활용하지 못하는 기존 Supervised Fine-Tuning (SFT) 기반 sim-real co-training의 한계를 극복하고자 합니다.

#Review #Reinforcement Learning #Sim-to-Real #Co-training #VLA Models #Robotic Manipulation #Supervised Fine-tuning #Catastrophic Forgetting

2026년 2월 15일

[논문리뷰] ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

본 논문은 파편화된 데이터, 불일치하는 표현, 그리고 학습 목표의 불균형으로 인해 다형성 로봇 하드웨어에 걸쳐 범용적인 임베디드 에이전트를 구축하는 데 따르는 근본적인 문제를 해결하고자 합니다.

#Review #Robotic Manipulation #Vision-Language-Action (VLA)#Foundation Models #Action Manifold Learning #Diffusion Transformers #Data Curation #Embodied AI

2026년 2월 15일

[논문리뷰] χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

본 논문은 장시간 로봇 조작 태스크에서 발생하는 분포 불일치(distributional inconsistencies) 문제를 해결하여 생산 수준의 로봇 강건성(robustness)을 달성하는 것을 목표로 합니다.

#Review #Robotic Manipulation #Distributional Shift #Imitation Learning #Model Arithmetic #Stage Advantage #Train-Deploy Alignment #Resource-Efficient AI #Long-Horizon Tasks

2026년 2월 12일

[논문리뷰] RISE: Self-Improving Robot Policy with Compositional World Model

본 논문은 VLA(Vision-Language-Action) 모델 이 접촉이 많고 역동적인 로봇 조작 작업에서 여전히 취약하며, 물리적 환경에서의 온-정책(on-policy) 강화 학습이 하드웨어 비용, 느린 상호작용, 수동 리셋 등의 문제로 인해 확장이 어렵다는 한계를 해결하고자 합니다.

#Review #Robot Learning #Reinforcement Learning #World Models #Compositional Models #Robotic Manipulation #Self-Improving #Vision-Language-Action (VLA)

2026년 2월 12일

[논문리뷰] GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

본 논문은 현재 VLA(Vision-Language-Action) 모델이 겪는 제한된 장면 이해 능력과 약한 미래 예측 능력으로 인한 장기적인 액션 계획의 한계를 해결하는 것을 목표로 합니다.

#Review #VLA Models #World Models #Reinforcement Learning #Robotic Manipulation #Long-Horizon Control #Human-in-the-Loop #Continual Learning

2026년 2월 12일

[논문리뷰] SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

Vision-Language-Action (VLA) 모델의 고정된 추론 파이프라인이 지각적 모호성이나 행동의 다중 양상과 같은 불확실한 상황에서 오류를 누적하는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action Models #Self-Uncertainty Estimation #Adaptive Inference #Active Perception #Action Decoding #Visual Attention #Robotic Manipulation

2026년 2월 10일

[논문리뷰] SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation

본 논문은 로봇의 소프트바디 조작 시 발생하는 복잡한 상호작용 속에서 변형 가능한 객체의 동역학을 정확하고 안정적으로 시뮬레이션하는 근본적인 문제를 해결하고자 합니다.

#Review #Neural Simulator #Real-to-Sim (R2S)#Robotic Manipulation #Soft-body Dynamics #Gaussian Splatting #Deformable Objects #Action-conditioned Simulation #Long-horizon Simulation

2026년 2월 4일

[논문리뷰] Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

본 논문은 복잡한 시각-언어-액션 (VLA) 태스크에서 기존 추론 VLA 모델들이 긴 chain-of-thought (CoT) 추론 과정으로 인해 겪는 높은 추론 지연 시간(inference latency) 문제를 해결하고자 합니다.

#Review #Vision-Language-Action #Embodied AI #Latent Planning #Chain-of-Thought #Distillation #Inference Efficiency #Robotic Manipulation #Preference Learning

2026년 1월 14일

[논문리뷰] Act2Goal: From World Model To General Goal-conditioned Policy

본 논문은 장기 로봇 조작(long-horizon robotic manipulation)에서 기존 목표 조건부 정책(GCP)이 겪는 문제점, 즉 장기 일관성 유지의 어려움과 국소적 교란에 대한 반응성의 부족을 해결하고자 합니다.

#Review #Goal-Conditioned Policy #World Models #Robotic Manipulation #Multi-Scale Temporal Hashing #Online Adaptation #Hindsight Experience Replay #LoRA Finetuning #Zero-shot Generalization

2025년 12월 29일

[논문리뷰] MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

본 논문은 다양한 장기 로봇 조작 데이터의 부족과 기존 비디오 생성 모델의 한계를 극복하여, 물리적으로 그럴듯하고 논리적으로 일관된 장기 로봇 조작 비디오 를 합성하는 것을 목표로 합니다. 특히 수동으로 정의된 궤적에 의존하지 않고 자율적인 데이터 합성을 가능하게 하는 데 중점을 둡니다.

#Review #Video Generation #Robotic Manipulation #Hierarchical Framework #Reinforcement Learning #Diffusion Models #World Models #Cognitive Science #Physical Alignment

2025년 12월 9일

[논문리뷰] Mixture of Horizons in Action Chunking

본 논문은 Vision-Language-Action (VLA) 모델 에서 고정된 액션 청크 길이(horizon) 가 유발하는 근본적인 한계점을 해결하고자 합니다.

#Review #Vision-Language-Action Models #Action Chunking #Robotic Manipulation #Multi-horizon Planning #Transformer Architecture #Gated Fusion #Dynamic Inference

2025년 12월 2일

[논문리뷰] GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

본 논문은 일반적인 Vision-Language-Action (VLA) 파운데이션 모델 이 실제 환경에서 발생하는 긴 호라이즌의 정교하고 민첩한 로봇 조작 에서 겪는 한계를 해결하는 것을 목표로 합니다.

#Review #Robotic Manipulation #Reinforcement Learning #Vision-Language-Action #Dexterous Control #Long-Horizon Tasks #Data Filtering #Data Augmentation #Foundation Models

2025년 12월 1일

[논문리뷰] VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation

본 논문은 기존 VLA 모델이 겪는 공간-시간적 불연속성(spatiotemporally discontinuous) 및 미세한 제어 부족 문제를 해결하여, 로봇 조작을 위한 공간-시간적으로 일관성 있는(spatiotemporally coherent) VLA 모델인 VLA-4D 를 제안합니다.

#Review #Vision-Language-Action Models #Robotic Manipulation #SpatioTemporal Coherence #4D Awareness #Visual Representation #Action Representation #Cross-Attention

2025년 11월 23일

[논문리뷰] A Survey on Efficient Vision-Language-Action Models

이 논문은 대규모 Vision-Language-Action (VLA) 모델 이 직면한 막대한 계산 및 데이터 요구사항으로 인해 실제 로봇 환경에 배포되기 어려운 문제를 해결하는 것을 목표로 합니다.

#Review #Embodied AI #Robotic Manipulation #VLA Models #Efficient AI #Model Compression #Efficient Training #Data Collection #Multimodal AI

2025년 11월 9일

[논문리뷰] MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

로봇 조작 태스크를 위한 현실적이고 태스크 관련성이 높은 3D 탁상 장면(tabletop scene)을 자동으로 생성하는 것을 목표로 합니다. 기존 수동 또는 무작위 장면 생성 방식의 비효율성과 낮은 현실성을 극복하고, 고수준의 태스크 지시와 3D 장면 레이아웃 간의 큰 격차를 해소하고자 합니다.

#Review #3D Scene Generation #Robotic Manipulation #Large Language Models #Spatial Reasoning #Dataset #Direct Preference Optimization #Tabletop Scene

2025년 9월 29일

[논문리뷰] SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

본 논문은 Vision-Language-Action (VLA) 모델이 로봇 조작 태스크에서 겪는 데이터 희소성 과 일반화 능력 부족 이라는 두 가지 근본적인 문제를 해결하는 것을 목표로 합니다. 특히, 강화 학습(RL)을 통해 VLA 모델의 장기적이고 단계별 액션 플래닝 능력을 향상시키는 방법을 모색합니다.

#Review #Reinforcement Learning (RL)#Vision-Language-Action (VLA) Models #Robotic Manipulation #Data Scarcity #Generalization #Sim-to-Real Transfer #Online RL #Long-Horizon Planning

2025년 9월 12일

[논문리뷰] Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

본 논문은 로봇 조작에서 'seeing-to-doing gap'을 해소하고 일반화 능력을 향상시키는 것을 목표로 합니다. 데이터 부족과 다양한 로봇 형태에 따른 지식 전달의 어려움을 극복하기 위해, 시각-언어 이해와 저수준 행동 기본 요소를 연결하는 '포인팅(pointing)' 을 범용적인 중간 표현 으로 제안합니다.

#Review #Embodied AI #Robotic Manipulation #Reinforcement Learning #Vision-Language Model #Pointing #Zero-shot Generalization

2025년 8월 20일

[논문리뷰] Precise Action-to-Video Generation Through Visual Action Prompts

본 논문은 복잡하고 고자유도(high-DoF)의 상호작용(예: 인간의 손 또는 로봇 그리퍼 동작)을 위한 비디오 생성에서 정밀성과 범용성 간의 트레이드오프 문제를 해결하고자 합니다.

#Review #Action-to-Video Generation #Visual Action Prompts #Skeleton Representation #Human-Object Interaction #Robotic Manipulation #Cross-Domain Transfer #Diffusion Models

2025년 8월 19일

[논문리뷰] Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

본 논문은 로봇 조작을 위한 통합된 세계 파운데이션 플랫폼 (Genie Envisioner) 을 제시하여, 정책 학습, 평가 및 시뮬레이션을 단일 비디오-생성 프레임워크 내에서 통합하는 것을 목표로 합니다. 이는 기존 로봇 개발 과정의 단편적인 단계를 극복하고 확장 가능하며 범용적인 지능형 로봇 시스템 구축을 지향합니다.

#Review #Robotic Manipulation #World Model #Video Generation #Diffusion Model #Embodied AI #Foundation Model #Robotics Simulation #Policy Learning

2025년 8월 8일

[논문리뷰] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

본 논문은 로봇이 실제 환경에서 효과적으로 작동하기 위해 멀티모달 추론과 정확한 동작 생성을 통합하는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Instruction Tuning #Multimodal Reasoning #Robotic Manipulation #Catastrophic Forgetting #Mixture-of-Experts (MoE)#Flow Matching

2025년 8월 5일

[논문리뷰] RoboOmni: Proactive Robot Manipulation in Omni-modal Context

본 논문은 기존 로봇 조작 모델이 명시적인 지시에 의존하며 실제 환경에서 인간의 의도를 능동적으로 파악하는 데 한계가 있다는 문제를 해결합니다.

#Review #Robotic Manipulation #Multimodal LLMs #Vision-Language-Action #Proactive AI #Omni-modal Learning #Intent Recognition #Contextual Instructions

2025년 10월 29일

[논문리뷰] World-in-World: World Models in a Closed-Loop World

본 논문은 기존 세계 모델(World Models, WM) 평가 프로토콜이 시각적 품질에만 치중하여 실제 환경에 대한 embodied agent의 태스크 성공 여부 를 제대로 측정하지 못하는 문제를 해결하고자 합니다.

#Review #World Models #Embodied AI #Closed-Loop Evaluation #Online Planning #Data Scaling #Controllability #Robotic Manipulation

2025년 10월 22일

[논문리뷰] Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning

본 연구는 Vision-Language-Action (VLA) 모델 스케일링의 두 가지 주요 과제, 즉 사전 훈련된 VLA 모델 가중치 활용을 통한 효율적인 스케일업과 실시간 제어를 위한 모델 용량 및 연산 효율성 균형을 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Mixture of Experts (MoE)#Robotic Manipulation #Expert Specialization #Decoupled Routing #Load Balancing #Transfer Learning

2025년 10월 17일

[논문리뷰] R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

본 연구는 로봇 매니퓰레이션에서 공간적 일반화 를 위한 방대한 인간 시연 데이터 의 필요성을 해결하고자 합니다.

#Review #Robotic Manipulation #Data Augmentation #Spatial Generalization #3D Data Generation #Imitation Learning #Point Cloud #Real-to-Real #Mobile Manipulation

2025년 10월 10일

[논문리뷰] WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

로봇 조작을 위한 VLA(Vision-Language-Action) 모델 은 미세한 손-객체 상호작용을 포착하는 손목 시점(wrist-view) 관찰에 크게 의존하지만, 대규모 데이터셋에서는 이러한 손목 시점 데이터가 부족합니다.

#Review #4D World Models #Robotic Manipulation #Video Generation #Multi-view Synthesis #Visual-Language-Action (VLA)#Geometric Consistency #Diffusion Models #Wrist-View

2025년 10월 9일