#World Model

43개의 포스트

[논문리뷰] EgoSteer: A Full-Stack System Towards Steerable Dexterous Manipulation from Egocentric Videos

본 논문은 일반적인 로봇 조작 모델이 실시간 Steerability를 확보하지 못하고, 특정 로봇 환경에 국한되는 한계를 해결하고자 한다.

#Review #Steerable Dexterous Manipulation #VLA Models #Egocentric Videos #World Model #Robot Learning #DAgger

2026년 7월 13일

[논문리뷰] PanoWorld: Real-World Panoramic Generation

본 논문은 기존 파노라마 월드 모델들이 복잡한 야외 환경에서 공간적·시간적 일관성 및 물리적 정확성을 유지하는 데 한계가 있다는 문제 의식에서 출발한다.

#Review #Panoramic Generation #World Model #Diffusion Model #Rotation Equivariance #Dense Panoramic Ray-Conditioning #Geometry-aware Memory #World360

2026년 7월 12일

[논문리뷰] RynnWorld-Teleop: An Action-Conditioned World Model for Digital Teleoperation

본 논문은 로봇 학습을 위한 대규모 데이터 수집이 물리적 teleoperation의 물리적 제약과 자원 한계로 인해 병목 현상을 겪고 있다는 문제를 해결하고자 합니다.

#Review #Digital Teleoperation #World Model #Robotic Learning #Video Diffusion Transformer #Action-Conditioned Generation #Sim2Real Transfer #Imitation Learning

2026년 7월 7일

[논문리뷰] GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

로봇 파운데이션 모델의 발전에도 불구하고, 정책 성능을 평가하기 위한 물리적 로봇 실행은 여전히 높은 비용과 긴 시간이 소요되는 핵심 병목 구간입니다.

#Review #World Model #Robot Policy Evaluation #WMBench #Embodied AI #Video Generation #Policy Rollout

2026년 7월 6일

[논문리뷰] Deform360: A Massive Multi-view Visuotactile Dataset for Deformable World Models

변형 가능한 물체(deformable objects)는 무한대에 가까운 자유도와 복잡한 물리적 특성으로 인해 로봇 제어 및 동역학 예측에 큰 난제로 작용합니다.

#Review #Deformable Object #World Model #Visuotactile #3D Tracking #Robot Planning #Dataset #Gaussian Splatting

2026년 7월 6일

[논문리뷰] WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

본 논문은 기존 비디오 월드 모델이 시야를 벗어난 객체의 움직임과 정체성을 유지하지 못하는 문제를 해결하기 위해 WorldDirector를 제안합니다.

#Review #World Model #Video Generation #Dynamic Memory #Object Permanence #Controllable Simulation #Flow Matching #Spatial-Aware Control

2026년 7월 2일

[논문리뷰] DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model

본 논문은 제한된 컴퓨팅 환경에서 Real-time 인터랙티브 시뮬레이션을 가능하게 하는 DreamForge-World 0.1 Preview를 제안합니다 .

#Review #World Model #Interactive Generation #Real-time #Consumer GPU #Autoregressive #Multimodal #LoRA

2026년 6월 29일

[논문리뷰] Kairos: A Native World Model Stack for Physical AI

본 논문은 현재 World Model이 단순한 비디오 생성기를 넘어 Physical AI를 위한 근본적인 인프라로 진화해야 한다는 필요성에서 출발한다 .

#Review #Physical AI #World Model #Diffusion Transformer #Gated Linear Attention #Cross-Embodiment #Deployment-Aware #Embodied Control

2026년 6월 17일

[논문리뷰] ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

기존의 인터랙티브 월드 모델은 주로 이동(locomotion)과 뷰포인트 제어에 집중되어 있어, 실질적인 객체 상호작용을 지원하지 못하는 한계가 있습니다 . 이러한 '네비게이션-상호작용 격차'는 크게 두 가지 병목 현상에서 기인합니다.

#Review #World Model #Interactive Generation #Action-Aware Memory #Chunk-Autoregressive #Video Diffusion #Embodied AI #Human-Object Interaction

2026년 6월 16일

[논문리뷰] Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

본 연구는 로봇 공학에서 파편화된 행동 표현과 도메인별 시뮬레이션의 한계를 극복하기 위해 통합된 언어 기반의 월드 모델링을 제안한다. 기존 모델들은 특정 도메인(예: 조작, 주행)에 과적합되어 있거나 로봇 의존적인 제어 인터페이스를 요구하여 범용적인 로봇 학습 환경으로 사용하기 어렵다는 한계가 있다.

#Review #Embodied Intelligence #World Model #Video Generation #Language-Conditioned Action #Double-Stream MMDiT #Embodied World Knowledge

2026년 6월 15일

[논문리뷰] Geometric Action Model for Robot Policy Learning

본 논문은 기존의 Vision-Language-Action Models (VLAs)가 2D 기반의 시각적 지식에 의존하여 3D 물리적 조작 환경에서 깊이, 스케일, 폐색(occlusion)을 명시적으로 추론하지 못하는 한계를 해결하고자 합니다.

#Review #Robot Policy Learning #Geometric Foundation Model #Vision-Language-Action Model #World Model #Causal Future Prediction #3D Geometry

2026년 6월 15일

[논문리뷰] μ_0: A Scalable 3D Interaction-Trace World Model

본 논문은 기존 로봇 학습이 직면한 데이터 파라독스, 즉 '액션이 포함된 로봇 데이터의 희소성'과 '비디오 데이터의 높은 가용성' 사이의 간극을 해결하고자 합니다 .

#Review #World Model #3D Interaction-Trace #Robot Manipulation #Cross-Embodiment Learning #Semantic Flow Matching #Data Pipeline

2026년 6월 14일

[논문리뷰] WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

본 논문은 기존 월드 모델들이 복잡한 매니퓰레이션 태스크를 수행할 때 겪는 High Latency와 Context Length의 제한 문제를 해결하고자 한다.

#Review #World Model #Robotic Manipulation #Autoregressive Inference #Transformer #Efficiency #Generative Modeling

2026년 6월 11일

[논문리뷰] MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

본 논문은 단일 NFOV 이미지로부터 사용자가 자유롭게 이동하며 탐색할 수 있는 spatially persistent한 3D 환경을 생성하는 것을 목표로 합니다.

#Review #World Model #3D Gaussian Splatting #Panoramic Generation #Video Rendering #Real-Time Interaction

2026년 6월 11일

[논문리뷰] Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

본 논문은 기존 자율주행 시스템이 행동 조건부 동역학(Action-conditioned dynamics)을 명시적으로 모델링하지 못하고, 단순한 Direct State-to-Action Mapping에 의존한다는 근본적인 한계를 해결하고자 한다 .

#Review #Autonomous Driving #World Model #Discrete Diffusion #Token Editing #Policy Learning #Counterfactual Reasoning

2026년 6월 4일

[논문리뷰] Cosmos 3: Omnimodal World Models for Physical AI

Physical AI 에이전트 학습을 위한 기존의 파편화된 파이프라인은 이해(Understanding)와 생성(Generation) 모듈이 분리되어 있어 데이터 효율성과 확장성이 낮습니다.

#Review #World Model #Physical AI #Mixture-of-Transformers #Omnimodal #Data-Driven Specialization #Synthetic Data #Action-Conditioned Generation

2026년 6월 3일

[논문리뷰] DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

본 논문은 장기 비디오 생성 시 발생하는 시간적 일관성 부족과 계산 효율성 문제를 해결하기 위해 DecMem을 제안한다.

#Review #World Model #Video Generation #Long-horizon Extrapolation #Memory Architecture #Sparse Retrieval #Attention Dispersion

2026년 5월 31일

[논문리뷰] Learning POMDP World Models from Observations with Language-Model Priors

본 연구는 잠재 상태에 대한 정보(Ground-truth state)가 주어지지 않는 완전한 부분 관측 환경(Strict POMDP setting)에서 에이전트가 어떻게 효과적으로 세계 모델(World Model)을 학습할 수 있는지 탐구합니다.

#Review #POMDP #World Model #Large Language Models #Program Induction #Sample Efficiency #Partial Observability #Belief-based Filtering

2026년 5월 17일

[논문리뷰] SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

본 논문은 1분 이상의 고해상도 영상을 생성할 때 발생하는 높은 컴퓨팅 비용과 긴 시간 동안의 시각적·기하학적 일관성 유지 문제를 해결하고자 한다. 기존의 월드 모델 연구들은 대규모 데이터와 컴퓨팅 자원을 요구하며, 다수의 GPU 환경이 필수적인 경우가 많아 학계나 일반 연구자들이 접근하기 어렵다는 한계가 있다.

#Review #World Model #Diffusion Transformer #Long-context Modeling #Camera Control #6-DoF Trajectory #Efficiency #Video Generation

2026년 5월 14일

[논문리뷰] INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

본 논문은 기존의 비디오 생성 모델들이 장기적인 공간 일관성을 유지하지 못하고 실시간 대화형 내비게이션을 지원하는 데 한계가 있다는 문제점을 해결하고자 합니다.

#Review #World Model #Spatiotemporal Autoregressive #Diffusion Transformer #Camera Control #Distribution Matching Distillation

2026년 4월 8일

[논문리뷰] InCoder-32B-Thinking: Industrial Code World Model for Thinking

본 논문은 기존 LLM이 일반 코딩 작업에서는 뛰어난 성능을 보이나, 하드웨어 제약 조건과 복잡한 타이밍 시맨틱이 중요한 산업용 소프트웨어 개발 환경에서는 추론 능력이 부족하다는 문제를 해결하고자 합니다.

#Review #Industrial Code Intelligence #Chain-of-Thought #World Model #Error-driven Synthesis #Hardware-aware Coding

2026년 4월 5일

[논문리뷰] MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

기존 VLA 모델들은 hierarchical 구조나 autoregressive 패러다임에 의존함으로써 발생하는 아키텍처 오버헤드, 장기적 시간 일관성 결여, 그리고 환경 역학(environment dynamics)을 파악하는 명시적 메커니즘 부족이라는 한계에 직면해 있습니다.

#Review #Vision-Language-Action (VLA)#Discrete Diffusion #Multi-modal Generation #Robotic Manipulation #Action Chunking #World Model #Hybrid Attention

2026년 4월 1일

[논문리뷰] Learn2Fold: Structured Origami Generation with World Model Planning

Origami는 평면 시트를 복잡한 3D 구조로 변환하는 물리적 지능의 고난도 테스트베드입니다. 이는 단순한 시각적 플라시보가 아니라 기하학적 공리와 엄격한 Kinematic 제약 조건을 준수해야 하며, 작은 오류가 전체 구조의 붕괴를 초래하는 장기적인 추론 작업입니다.

#Review #Origami Generation #Neuro-symbolic Framework #World Model #Constraint-Aware Planning #Program Induction #Spatial Intelligence

2026년 3월 31일

[논문리뷰] Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

본 논문은 기존 월드 모델의 수백 개의 잠재 토큰 이 실시간 계획 수립에 필요한 계산 비용을 과도하게 증가시키는 문제를 해결하고자 합니다.

#Review #World Model #Discrete Tokenizer #Latent Representation #Action Planning #Model Predictive Control #Real-time AI #Compression #Vision Foundation Model

2026년 3월 8일

[논문리뷰] WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

본 논문은 카메라 안내 비디오 생성 모델(VDMs)이 일관된 3D 장면을 재구성하는 데 겪는 한계, 특히 제한적인 카메라 제어 및 여러 시점에서의 내용 불일치 문제를 해결하는 것을 목표로 합니다.

#Review #Video Generation #3D Reconstruction #Camera Control #Diffusion Models #Geometric Memory #Multi-View Consistency #World Model

2026년 3월 2일

[논문리뷰] World Guidance: World Modeling in Condition Space for Action Generation

본 논문은 Vision-Language-Action (VLA) 모델이 효율적이고 예측 가능한 미래 표현을 유지하면서 정밀한 액션 생성을 위한 충분한 세분화된 정보를 보존하는 데 어려움을 겪는 문제를 해결합니다.

#Review #World Model #Action Generation #Vision-Language-Action Models (VLA)#Condition Space #Imitation Learning #Robotics #Generalization #Human Manipulation

2026년 2월 25일

[논문리뷰] K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

GPU 커널 최적화의 복잡성으로 인해 기존 LLM 기반의 진화론적 접근 방식이 다단계 구조 변환 및 일시적인 구현 결함에 취약하다는 문제를 해결하는 것이 목표입니다.

#Review #LLM #GPU Kernel Optimization #Code Generation #World Model #Evolutionary Search #Program Synthesis #High-Performance Computing

2026년 2월 23일

[논문리뷰] World Models for Policy Refinement in StarCraft II

본 논문은 StarCraft II (SC2) 와 같이 복잡하고 부분 관측 가능한(partially observable) 실시간 전략(RTS) 게임 환경에서 대규모 언어 모델(LLM) 기반 에이전트 의 정책 결정 능력을 개선하는 것을 목표로 합니다.

#Review #StarCraft II #World Model #Policy Refinement #Large Language Models #Reinforcement Learning #Partial Observability #Structured Text Representation #Game AI

2026년 2월 19일

[논문리뷰] Computer-Using World Model

본 논문은 복잡한 소프트웨어 환경에서 에이전트가 행동의 결과를 추론하는 능력의 부재로 인해 발생하는 문제를 해결하는 것을 목표로 합니다.

#Review #World Model #GUI Agents #Desktop Automation #Reinforcement Learning #Large Language Models #Visual State Realization #Textual State Transition

2026년 2월 19일

[논문리뷰] Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

본 논문은 대규모 언어 모델(LLM) 기반 에이전트 훈련을 위한 다양하고 신뢰할 수 있는 환경의 부족 문제 를 해결하고자 합니다.

#Review #Agentic AI #Reinforcement Learning #Synthetic Environments #Tool-Use Agents #World Model #Database-Backed Simulation #LLM-powered Agents

2026년 2월 10일

[논문리뷰] SWE-World: Building Software Engineering Agents in Docker-Free Environments

소프트웨어 엔지니어링(SWE) 에이전트의 훈련 및 평가가 의존하는 Docker 기반 물리적 실행 환경 의 높은 자원 소모와 확장성 한계를 해결하는 것이 목표입니다.

#Review #Software Engineering Agents #LLM #Docker-Free #Execution Simulation #Reinforcement Learning #Supervised Fine-tuning #World Model

2026년 2월 3일

[논문리뷰] Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

본 연구는 벤처 캐피탈 자본금 내역 검증('cap table tie-out')이라는 복잡한 법률 워크플로우를 자동화하는 것을 목표로 합니다.

#Review #Legal AI #Venture Capital #Due Diligence #Capitalization Table #Multi-document Reasoning #Knowledge Graph #World Model #Neuro-Symbolic AI

2025년 12월 22일

[논문리뷰] Evaluating Gemini Robotics Policies in a Veo World Simulator

이 논문은 현실감, 확장성, 안전성 측면에서 기존 물리 기반 시뮬레이터가 가진 한계를 극복하고, 제너럴리스트 로봇 정책 평가를 위한 새로운 방법론을 제시합니다.

#Review #Robotics #Policy Evaluation #World Model #Video Generation #Out-of-Distribution (OOD)#Safety #Gemini Robotics #Veo Simulator

2025년 12월 11일

[논문리뷰] UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

자율 주행 시스템이 제한된 세계 지식 과 시각적 동적 모델링 부족 으로 인해 롱테일 시나리오에서 겪는 어려움을 해결하는 것이 목표입니다.

#Review #Autonomous Driving #End-to-End Learning #Vision-Language Models #World Model #Chain-of-Thought #Video Generation #Trajectory Planning #Multimodal Learning

2025년 12월 10일

[논문리뷰] RynnVLA-002: A Unified Vision-Language-Action and World Model

본 논문은 기존 VLA 모델(액션 다이내믹스 이해 부족, 상상력 및 물리 지식 결여)과 월드 모델(직접적인 액션 생성 불가)의 한계를 극복하기 위해, VLA 모델과 월드 모델을 단일 프레임워크로 통합 하는 것을 목표로 합니다.

#Review #Vision-Language-Action (VLA) Model #World Model #Robotics #Unified Framework #Multi-modal Learning #Action Generation #Attention Mask #Continuous Control

2025년 11월 23일

[논문리뷰] NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

본 논문은 Vision-Language-Action (VLA) 모델이 실제 환경 및 다양한 로봇 플랫폼에서 보이는 낮은 신뢰성과 일반화 문제를 해결하는 것을 목표로 합니다.

#Review #Vision-Language-Action Model #Direct Preference Optimization #World Model #Reward Learning #Robotics #Embodied AI #Flow-Matching

2025년 11월 17일

[논문리뷰] WoW: Towards a World omniscient World model Through Embodied Interaction

본 논문은 수동적 관찰에 의존하는 기존 비디오 생성 모델의 한계(물리적 인과관계 이해 부족)를 극복하고, 대규모의 인과관계가 풍부한 실제 상호작용 데이터 를 통해 로봇이 물리적 직관을 습득할 수 있는 세계 모델(World Model) 을 개발하는 것을 목표로 합니다.

#Review #World Model #Embodied AI #Robotics #Diffusion Models #Physical Reasoning #Vision Language Models #Interaction Data #Self-Optimization

2025년 9월 29일

[논문리뷰] Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

본 논문은 기존 인터랙티브 월드 모델이 양방향 어텐션과 긴 추론 단계로 인해 발생하는 지연 문제를 해결하고 실시간 성능을 개선하는 것을 목표로 합니다.

#Review #World Model #Interactive Video Generation #Real-Time AI #Diffusion Models #Auto-Regressive Generation #Data Pipeline #Self-Forcing #KV Caching

2025년 8월 19일

[논문리뷰] Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

본 논문은 로봇 조작을 위한 통합된 세계 파운데이션 플랫폼 (Genie Envisioner) 을 제시하여, 정책 학습, 평가 및 시뮬레이션을 단일 비디오-생성 프레임워크 내에서 통합하는 것을 목표로 합니다. 이는 기존 로봇 개발 과정의 단편적인 단계를 극복하고 확장 가능하며 범용적인 지능형 로봇 시스템 구축을 지향합니다.

#Review #Robotic Manipulation #World Model #Video Generation #Diffusion Model #Embodied AI #Foundation Model #Robotics Simulation #Policy Learning

2025년 8월 8일

[논문리뷰] Emu3.5: Native Multimodal Models are World Learners

본 논문은 비전과 언어에 걸쳐 다음 상태를 예측하는 대규모 멀티모달 월드 모델인 Emu3.5 를 소개합니다. 자연스러운 멀티모달 능력 을 통해 긴 시퀀스 비전-언어 생성, X2I(Any-to-Image) 생성, 복잡한 텍스트 기반 이미지 생성 및 일반화 가능한 월드 모델링 능력 을 향상시키는 것을 목표로 합니다.

#Review #Multimodal Model #World Model #Vision-Language #Next-Token Prediction #Reinforcement Learning #Discrete Diffusion Adaptation #Image Generation #Any-to-Image

2025년 10월 31일

[논문리뷰] Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

이 논문은 기존의 자율주행 월드 모델들이 합성 데이터의 효과를 다운스트림 인지 태스크 에 대해 불공정하게 평가하고 있음을 지적합니다.

#Review #Synthetic Data Generation #Autonomous Driving #Perception Tasks #Diffusion Models #3D Asset Editing #World Model #Data Augmentation #nuScenes

2025년 10월 30일

[논문리뷰] ODesign: A World Model for Biomolecular Interaction Design

ODesign은 기존의 분자 설계 AI 모델들이 특정 분자 유형에만 전문화되어 상호작용 세부 사항에 대한 미세 조정이 부족하다는 한계를 해결하고자 합니다.

#Review #Biomolecular Interaction Design #Generative AI #World Model #Multimodal Molecular Design #All-atom Generation #Diffusion Models #Protein Design #Nucleic Acid Design

2025년 10월 30일

[논문리뷰] GigaBrain-0: A World Model-Powered Vision-Language-Action Model

본 논문은 일반 로봇용 VLA(Vision-Language-Action) 모델이 직면한 대규모 실제 로봇 데이터 수집의 비효율성 및 제한된 다양성 문제를 해결하는 것을 목표로 합니다.

#Review #Vision-Language-Action Model #World Model #Data Augmentation #Robot Generalization #Embodied AI #RGBD #Chain-of-Thought

2025년 10월 23일