#World Models

70개의 포스트

[논문리뷰] Infinite Worlds with Versatile Interactions

본 논문은 interactive world model이 실시간성과 장기적 안정성을 동시에 확보하지 못하는 한계를 해결하고자 합니다.

#Review #World Models #Causal Video Generation #Interactive Simulation #Agentic Harness #Diffusion Transformer #Long-horizon Stability

2026년 7월 8일

[논문리뷰] Imagined Rollouts are Kinematic, Not Dynamic: A Diagnosis of Long-Horizon World-Model Failure

본 논문은 현대의 World Models가 장기 예측에서 겪는 성능 저하가 단순히 '오차 누적(compounding error)'의 결과가 아니라, 모델이 물리적 역학(dynamics)을 배우지 못하고 구조적으로 운동학(kinematics) 수준에서만 작동하기 때문임을 증명합니다.

#Review #World Models #Kinematic Fallback #iKCE #Long-Horizon Failure #Embodied AI #Dynamic Imagination

2026년 7월 8일

[논문리뷰] PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

본 논문은 최신 비디오 생성 모델들이 로봇 조작 과업에서 나타내는 물리적 불일치(Physical implausibility) 문제를 해결하고자 합니다.

#Review #Embodied Intelligence #Video Generation #World Models #Physics-aware #Robotic Manipulation #Hierarchical Alignment

2026년 6월 28일

[논문리뷰] Hallucination in World Models is Predictable and Preventable

본 논문은 현대의 generative world model들이 매우 사실적인 미래를 생성함에도 불구하고, 실제 동역학으로부터 이탈하는 Hallucination 문제를 해결하고자 한다.

#Review #World Models #Hallucination #Data Coverage #Visual Generative Modeling #Representation Learning #Curiosity-driven Data Collection

2026년 6월 25일

[논문리뷰] World Value Models for Robotic Manipulation

본 연구는 기존 로봇 가치 모델이 정적인 이미지 기반의 VLM 백본에 의존하여 장기적인 시간적 맥락과 미래 결과를 이해하는 데 한계가 있다는 문제점에서 출발합니다.

#Review #World Models #Robotic Manipulation #Value Estimation #Flow Matching #Distributional Value #Suboptimal-Value-Bench

2026년 6월 23일

[논문리뷰] Current World Models Lack a Persistent State Core

본 논문은 현대의 World Models가 정교한 프레임을 생성할 수는 있으나, 관찰자가 보고 있지 않을 때에도 독립적으로 진화해야 하는 '지속적인 세계 상태(Persistent State Core)'를 결여하고 있다는 점을 지적합니다.

#Review #World Models #Persistent State #Viewpoint Intervention #WRBench #Video Generation #Diagnostic Benchmark #World-State Consistency

2026년 6월 18일

[논문리뷰] EgoCS-400K: An Egocentric Gameplay Dataset for World Models

본 논문은 대규모 상호작용 세계 모델(Interactive World Models) 학습을 위한 고품질의 영상-액션-언어 데이터셋 부족 문제를 해결하고자 한다.

#Review #World Models #Egocentric Video #Gaming Agent #Video Generation #Replay-grounded #Embodied AI

2026년 6월 16일

[논문리뷰] WorldOlympiad: Can Your World Model Survive a Triathlon?

본 연구는 기존 세계 모델 평가 방식이 파편화되어 있고 실제 물리적 환경에서의 복합적인 능력을 충분히 측정하지 못한다는 문제의식에서 출발합니다. 현재 대부분의 연구는 특정 태스크에만 최적화되어 있어, 변화하는 환경에서의 일반화(Generalization) 성능이나 복잡한 인과 관계 이해도를 확인하기 어렵습니다.

#Review #World Models #Benchmarking #Embodied AI #Generalization #Multimodal Evaluation #Simulator

2026년 6월 9일

[논문리뷰] Bridging the Agent-World Gap: Text World Models for LLM-based Agents

본 논문은 LLM 기반 에이전트가 복잡하고 동적인 환경에서 환경 변화를 정확히 예측하지 못해 발생하는 Agent-World Gap 문제를 해결하고자 합니다.

#Review #LLM-based Agents #World Models #Text World Models #Environment Interaction #Planning #Sequential Decision Making

2026년 6월 9일

[논문리뷰] WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

본 논문은 기존 비디오 생성 모델이 가진 정적인 생성 한계를 극복하고, 사용자가 직접 환경과 상호작용할 수 있는 능동적인 세계 모델 구축을 목표로 합니다.

#Review #World Models #Interactive Video Generation #Object Manipulation #Camera Navigation #Embodied AI

2026년 6월 8일

[논문리뷰] Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

본 논문은 비디오 생성 모델이 단순히 시각적으로 그럴듯한 영상을 만드는 수준을 넘어, 실제 물리 법칙을 내재화한 'World Model'로서의 기능을 수행하는지 검증하고자 합니다.

#Review #Video Generation Models #Robotic Manipulation #Physical Executability #Benchmark #Sim-to-Real #World Models

2026년 6월 4일

[논문리뷰] World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

본 논문은 미래지향적 시각 추론에서 World Models와 MLLMs를 결합할 때 발생하는 신뢰성 문제를 해결하고자 합니다. 기존의 단순한 결합 방식은 생성된 Rollout이 확률적이고 때로는 작업상 부정확할 수 있음에도 불구하고, 이를 에이전트가 효과적으로 제어하지 못한다는 한계가 있습니다 .

#Review #World Models #Multimodal Large Language Models (MLLMs)#Controlled Concrete Reasoning #Privileged-Future On-Policy Self-Distillation (PF-OPSD)#Future Prediction #Simulation-Control

2026년 6월 2일

[논문리뷰] YoCausal: How Far is Video Generation from World Model? A Causality Perspective

본 논문은 최신 Video Diffusion Models (VDMs)가 진정한 의미의 세계 모델(World Model)로 발전하고 있는지, 아니면 단순히 통계적 시간 패턴을 과적합(overfit)하고 있는지를 검증하고자 합니다.

#Review #Video Generation #World Models #Causality #Violation of Expectation #Reverse Surprise Index #Causality Cognition Index #Diffusion Models

2026년 5월 28일

[논문리뷰] WorldKV: Efficient World Memory with World Retrieval and Compression

본 논문은 Autoregressive 비디오 모델에서 실시간성을 유지하면서도 공간적·시간적 일관성을 갖춘 장기 기억(Long-term memory)을 구현하는 문제를 해결하고자 합니다.

#Review #World Models #Autoregressive Video Diffusion #KV Cache Management #World Retrieval #World Compression #Real-time Inference #Long-term Consistency

2026년 5월 21일

[논문리뷰] Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

본 논문은 실시간 인터랙티브 비디오 생성을 위해 Frame-wise 수준의 초저지연 1–2 step 생성 체계로 확장이 필요함을 정의합니다 . 기존의 연구들은 주로 Chunk-wise 4-step 방식을 채택하여 실시간성 확보에 한계가 있었으며, 적절한 Few-step AR 학생 모델 초기화가 병목 현상으로 작용합니다.

#Review #Autoregressive Diffusion #Diffusion Distillation #Real-time Video Generation #Causal Consistency Distillation #Few-Step Inference #World Models

2026년 5월 14일

[논문리뷰] Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

본 논문은 기존 Unified World Model들이 2D 픽셀 공간에만 국한되어 기하학적 구조에 대한 이해가 부족하며, 고차원 비디오 생성과 저차원 행동 예측 사이의 효율적인 균형을 맞추지 못한다는 문제를 해결하고자 한다.

#Review #Embodied AI #World Models #Diffusion Transformer #3D Reconstruction #Robotic Manipulation #Asynchronous Denoising #Unified Modeling

2026년 4월 29일

[논문리뷰] Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

본 논문은 로봇 제어 루프에 월드 모델을 통합하여 미래 예측과 평가 과정을 추가하는 Cortex 2.0 아키텍처를 제안합니다 . 제안 모델은 현재 관측치를 바탕으로 월드 모델을 통해 $k$개의 미래 궤적 후보를 잠재 공간(Visual latent space)에서 생성합니다.

#Review #Vision-Language-Action Models #World Models #Robotic Manipulation #Plan-and-Act #Process-Reward Operator #Flow Matching #Cross-Embodiment

2026년 4월 22일

[논문리뷰] Neural Computers

본 논문은 최신 비디오 생성 모델인 Wan2.1을 기반으로 인터페이스별 특화된 데이터 엔진과 학습 레시피를 적용하여 CLI와 GUI 환경에서의 NC 프로토타입을 구축하였다. 모델은 주어진 입력을 통해 latent state를 업데이트하고 다음 프레임을 생성하는 update-and-render 루프를 수행한다.

#Review #Neural Computer #World Models #Interactive Video Generation #Latent Runtime State #CNC #CLI/GUI Interfaces

2026년 4월 8일

[논문리뷰] OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

본 논문은 월드 모델의 개념적 모호성을 해결하고 표준화된 정의 및 통합 프레임워크를 정립하기 위해 OpenWorldLib 을 제안한다.

#Review #World Models #Unified Inference Framework #Multimodal Reasoning #Vision-Language-Action #3D Generation #Interactive Video Generation

2026년 4월 6일

[논문리뷰] Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

기존 비디오 기반 World Models 의 평가 벤치마크들은 주로 시각적 충실도(visual fidelity) 및 텍스트-비디오 정렬(text-video alignment)에만 협소하게 초점을 맞추거나, 시간적 역동성(temporal dynamics)을 근본적으로 무시하는 정적 3D 재구성(3D reconstruction) 메트릭에 의존해왔다.

#Review #World Models #4D Generation #Interactive Response #Evaluation Benchmark #Omni-WorldSuite #Omni-Metrics #AgenticScore #Causal Consistency

2026년 3월 23일

[논문리뷰] FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

본 논문은 Transformer 기반 world model의 O(N²) 연산 비용과 공간적 inductive bias 부재 문제를 해결하기 위해, reaction-diffusion 형태의 편미분방정식을 예측 동역학으로 사용하는 FluidWorld를 제안합니다.

#Review #World Models #Reaction-Diffusion PDE #Video Prediction #Latent Dynamics #Spatial Inductive Bias #Computational Efficiency

2026년 3월 23일

[논문리뷰] MosaicMem: Hybrid Spatial Memory for Controllable Video World Models

비디오 diffusion 모델은 단순한 plausible clip 생성에서 카메라 모션, revisits, 그리고 intervention 하에서 일관성을 유지하는 world simulator로 발전하고 있습니다.

#Review #Spatial Memory #World Models #Video Diffusion Models #Hybrid Memory #Controllable Video Generation #Long-horizon Consistency #Patch-and-Compose

2026년 3월 18일

[논문리뷰] Reward Prediction with Factorized World States

본 연구는 AI 에이전트가 새로운 목표와 환경에 걸쳐 일반화할 수 있는 정확하고 일반화 가능한 보상 예측 모델 을 개발하는 것을 목표로 합니다. 특히 훈련 데이터의 편향과 일반화 한계가 있는 기존 지도학습 기반 보상 모델의 문제를 해결하고, 미세한 단계별 보상 평가를 위한 벤치마크 부족을 해소하고자 합니다.

#Review #Reward Prediction #World Models #State Representation #Large Language Models #Zero-shot Learning #Reinforcement Learning #Planning #Factorization

2026년 3월 10일

[논문리뷰] WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching

본 연구는 확산 기반 월드 모델의 높은 추론 비용 문제, 특히 대화형 사용 및 장기 롤아웃에 필요한 비용을 해결하는 것을 목표로 합니다. 기존 단일 모달 확산 모델을 위한 캐싱 정책이 다중 모달 토큰의 이질성과 비균일한 시간적 역학으로 인해 월드 모델에 제대로 적용되지 못하는 한계를 극복하고자 합니다.

#Review #World Models #Diffusion Models #Inference Acceleration #Feature Caching #Heterogeneous Tokens #Curvature Prediction #Adaptive Skipping

2026년 3월 8일

[논문리뷰] Next Embedding Prediction Makes World Models Stronger

부분적으로 관측 가능하고 고차원적인 환경에서 모델 기반 강화 학습(MBRL) 에이전트의 장기적인 시간 종속성 포착 능력 을 개선하는 것이 목표입니다.

#Review #Model-Based Reinforcement Learning #World Models #Decoder-Free #Temporal Transformer #Next-Embedding Prediction #Latent Representation #Partial Observability #Barlow Twins

2026년 3월 3일

[논문리뷰] Chain of World: World Model Thinking in Latent Motion

기존 VLA(Vision-Language-Action) 모델이 예측 능력 부족과 시각적 중복성 재구성에 따른 비효율성을 보이는 한계를 극복하고, 잠재 액션 모델의 연속적인 동적 모델링 및 세계 지식 부족 문제를 해결하고자 합니다.

#Review #Vision-Language-Action Models #World Models #Latent Motion #Embodied Intelligence #Temporal Reasoning #Disentangled Representation #Robotics #Pretraining

2026년 3월 3일

[논문리뷰] The Trinity of Consistency as a Defining Principle for General World Models

본 논문은 최신 생성 AI 모델들이 시각적으로 그럴듯한 결과물을 생성하지만, 물리 법칙과 인과 관계를 이해하는 데 한계를 보이는 문제를 해결하고자 합니다.

#Review #World Models #Multimodal Generative AI #Consistency Theory #Spatial-Temporal Reasoning #Causal Simulation #AI Benchmarking #Artificial General Intelligence

2026년 2월 26일

[논문리뷰] Causal-JEPA: Learning World Models through Object-Level Latent Interventions

기존 객체 중심(object-centric) 월드 모델이 상호작용 의존적 다이내믹스를 포착하지 못하고 자가 다이내믹스나 우발적 상관관계에 의존하는 한계를 해결하고자 합니다.

#Review #World Models #Object-Centric Representations #Latent Interventions #Masked Prediction #Causal Inductive Bias #Joint Embedding Predictive Architecture (JEPA)#Visual Question Answering (VQA)#Model Predictive Control (MPC)

2026년 2월 17일

[논문리뷰] RISE: Self-Improving Robot Policy with Compositional World Model

본 논문은 VLA(Vision-Language-Action) 모델 이 접촉이 많고 역동적인 로봇 조작 작업에서 여전히 취약하며, 물리적 환경에서의 온-정책(on-policy) 강화 학습이 하드웨어 비용, 느린 상호작용, 수동 리셋 등의 문제로 인해 확장이 어렵다는 한계를 해결하고자 합니다.

#Review #Robot Learning #Reinforcement Learning #World Models #Compositional Models #Robotic Manipulation #Self-Improving #Vision-Language-Action (VLA)

2026년 2월 12일

[논문리뷰] GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

본 논문은 현재 VLA(Vision-Language-Action) 모델이 겪는 제한된 장면 이해 능력과 약한 미래 예측 능력으로 인한 장기적인 액션 계획의 한계를 해결하는 것을 목표로 합니다.

#Review #VLA Models #World Models #Reinforcement Learning #Robotic Manipulation #Long-Horizon Control #Human-in-the-Loop #Continual Learning

2026년 2월 12일

[논문리뷰] WorldCompass: Reinforcement Learning for Long-Horizon World Models

본 논문은 상호작용적 비디오 기반 세계 모델(world models)의 장기적인 탐색 정확도와 일관성을 향상시키기 위해, 강화 학습(RL) 기반의 후처리 훈련 프레임워크인 WorldCompass 를 제안합니다.

#Review #Reinforcement Learning #World Models #Video Generation #Autoregressive Generation #Long-Horizon #Post-training #Diffusion Models #Reward Functions

2026년 2월 9일

[논문리뷰] OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

현재 LLM 에이전트 평가가 주로 연역적(deductive) 패러다임 에 집중되어 있어, 에이전트가 환경의 숨겨진 규칙을 자율적으로 발견하는 귀납적(inductive) 능력 을 측정하는 데 한계가 있음을 지적합니다.

#Review #LLM Agents #Benchmarking #Inductive Reasoning #Long-Horizon Tasks #Active Exploration #World Models #Autonomous Discovery

2026년 2월 8일

[논문리뷰] Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

현재 단편적인 방식으로 세계 지식을 주입하는 AI 연구의 한계를 극복하고, 통합적이고 총체적인 세계 이해 를 가능하게 하는 세계 모델(World Models) 을 위한 통합 설계 프레임워크 를 제안하는 것이 목표입니다.

#Review #World Models #Unified Framework #Multimodal AI #Embodied AI #Physical Understanding #Long-term Consistency #AI Agents #Generative Models

2026년 2월 3일

[논문리뷰] Advancing Open-source World Models

본 논문은 기존 비디오 생성 모델의 한계(데이터 희소성, 장기 일관성 부족, 실시간 상호작용의 어려움, 독점적 솔루션)를 극복하고, 가상 세계의 역학을 학습하며 실시간으로 렌더링할 수 있는 오픈 소스 세계 모델(world model) 인 LingBot-World를 개발하는 것을 목표로 합니다.

#Review #World Models #Open-source AI #Video Generation #Real-time Simulation #Long-term Memory #Action-Conditioned Learning #Generative Models #Embodied AI

2026년 1월 28일

[논문리뷰] Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

본 논문은 기존 AI 시스템이 언어적/추상적 영역에서 강세를 보이지만, 풍부한 표현과 사전 지식, 특히 명시적인 시각적 세계 모델링의 부족으로 인해 물리적/공간적 지능 분야에서는 인간에 비해 뒤처지는 문제를 해결하고자 합니다.

#Review #Multimodal AI #World Models #Visual Generation #Chain-of-Thought (CoT)#Multimodal Reasoning #Unified Multimodal Models #Spatial-Physical Reasoning

2026년 1월 27일

[논문리뷰] Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

본 논문은 대규모 사전 훈련된 비디오 생성 모델 의 시공간적 사전 지식을 로봇 정책 학습에 활용하는 것을 목표로 합니다.

#Review #Video Models #Visuomotor Control #Robot Policy #Fine-tuning #Diffusion Models #World Models #Model-based Planning #Imitation Learning

2026년 1월 22일

[논문리뷰] Aligning Agentic World Models via Knowledgeable Experience Learning

본 논문은 대규모 언어 모델(LLMs) 기반 에이전트 월드 모델이 겪는 '물리적 환각(physical hallucinations)' 문제를 해결하고자 합니다.

#Review #Agentic AI #World Models #Experience Learning #LLMs #Physical Hallucinations #Embodied AI #Predictive Coding #Knowledge Repository

2026년 1월 20일

[논문리뷰] Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

대규모 언어 모델(LLM) 기반 에이전트가 '얕은 그라운딩(shallow grounding)' 문제로 인해 행동의 장기적 결과를 예측하지 못하여 발생하는 실패를 해결하는 것이 목표입니다.

#Review #LLM Agents #World Models #Adaptive Planning #Lookahead #Reinforcement Learning #POMDP #Task Planning #Reasoning

2026년 1월 14일

[논문리뷰] Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

기존 비디오 생성 '월드 모델'이 복잡한 물리적 작업을 위한 정확한 목표를 지정하는 데 어려움을 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #Video Generation #World Models #Physics-Conditioned Goals #Causal Planning #Force Vectors #Zero-Shot Generalization #Diffusion Models #Robotics Planning

2026년 1월 11일

[논문리뷰] SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

본 논문은 수술 로봇 학습의 주요 병목인 시각 관측 및 정확한 로봇 움직임 데이터의 부족 문제 를 해결하고자 합니다. 대량의 수술 비디오가 존재하지만 로봇 액션 레이블이 없어 모방 학습에 직접 활용하기 어렵습니다. 따라서, 세계 모델을 통해 일반화 가능하고 데이터 효율적인 수술 로봇 정책 을 학습하는 것을 목표로 합니다.

#Review #Surgical Robotics #World Models #Video Generation #Imitation Learning #Inverse Dynamics Model #Synthetic Data #Vision-Language-Action Models #Data Scarcity

2025년 12월 29일

[논문리뷰] Act2Goal: From World Model To General Goal-conditioned Policy

본 논문은 장기 로봇 조작(long-horizon robotic manipulation)에서 기존 목표 조건부 정책(GCP)이 겪는 문제점, 즉 장기 일관성 유지의 어려움과 국소적 교란에 대한 반응성의 부족을 해결하고자 합니다.

#Review #Goal-Conditioned Policy #World Models #Robotic Manipulation #Multi-Scale Temporal Hashing #Online Adaptation #Hindsight Experience Replay #LoRA Finetuning #Zero-shot Generalization

2025년 12월 29일

[논문리뷰] Active Intelligence in Video Avatars via Closed-loop World Modeling

기존 비디오 아바타 생성 방식이 단순한 애니메이션을 넘어 자율적인 에이전시 를 가지지 못하고 장기 목표를 달성할 수 없는 한계를 해결하는 것이 목표입니다.

#Review #Video Avatars #Active Intelligence #World Models #Closed-loop Reasoning #POMDP #Generative AI #Hierarchical Planning #Cognitive Architecture

2025년 12월 23일

[논문리뷰] The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

본 논문은 기존 텍스트 전용 또는 궤적 기반 이미지-투-비디오(I2V) 생성 모델의 한계를 극복하고, 더욱 풍부하고 사용자 지향적인 '프롬프트 가능한 월드 이벤트' 시뮬레이션을 가능하게 하는 것을 목표로 합니다.

#Review #World Models #Video Generation #Multimodal Control #Trajectory Guidance #Reference Images #Promptable Events #Cross-Attention #Diffusion Models

2025년 12월 18일

[논문리뷰] MMGR: Multi-Modal Generative Reasoning

본 논문은 대규모 텍스트-투-비디오 모델 평가의 한계, 특히 인지적 충실도를 넘어선 추론 능력 을 평가하는 문제를 해결하고자 합니다.

#Review #Multi-Modal Generative Models #Reasoning Evaluation #World Models #Physical Commonsense #Abstract Reasoning #Embodied Navigation #VLM-based Evaluation #Temporal Consistency

2025년 12월 16일

[논문리뷰] Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

본 논문은 기존 3D Gaussian Splatting(3DGS) 뷰어의 한계인 파편화, 무거움, 레거시 파이프라인 제약으로 인한 높은 배포 마찰과 동적 콘텐츠 및 생성 모델 지원 부족 문제를 해결하고자 합니다.

#Review #Neural Rendering #3D Gaussian Splatting #WebGPU #ONNX Inference #World Models #Real-time Rendering #Browser-based #Dynamic Scenes

2025년 12월 9일

[논문리뷰] MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

본 논문은 다양한 장기 로봇 조작 데이터의 부족과 기존 비디오 생성 모델의 한계를 극복하여, 물리적으로 그럴듯하고 논리적으로 일관된 장기 로봇 조작 비디오 를 합성하는 것을 목표로 합니다. 특히 수동으로 정의된 궤적에 의존하지 않고 자율적인 데이터 합성을 가능하게 하는 데 중점을 둡니다.

#Review #Video Generation #Robotic Manipulation #Hierarchical Framework #Reinforcement Learning #Diffusion Models #World Models #Cognitive Science #Physical Alignment

2025년 12월 9일

[논문리뷰] UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

기존 비디오 생성 모델들이 단일 모달리티 조건화 및 제한된 모달 다양성으로 인해 세계를 총체적으로 이해하는 데 한계 가 있음을 지적하며, 이를 극복하기 위해 다중 모달리티(세분화 마스크, 인간 골격, DensePose, 광학 흐름, 깊이 맵) 및 다중 훈련 패러다임 을 통합하여 세계 인식 비디오 생성 을 향상시키는 것을 목표로 합니다.

#Review #Video Generation #Multi-modal Learning #Multi-task Learning #Zero-shot Generalization #Diffusion Models #World Models #Video Understanding

2025년 12월 8일

[논문리뷰] TV2TV: A Unified Framework for Interleaved Language and Video Generation

본 논문은 복잡한 시맨틱 추론이나 반복적인 고수준 계획이 필요한 비디오 생성에서 기존 모델들이 겪는 한계를 극복하고자 합니다. 비디오 생성을 텍스트와 비디오 생성의 교차 프로세스로 분해함으로써 시각적 품질과 사용자 제어 가능성을 획기적으로 향상시키는 것을 목표로 합니다.

#Review #Video Generation #Language Modeling #Multimodal AI #Interleaved Generation #Flow Matching #Transformer #Controllability #World Models

2025년 12월 4일

[논문리뷰] EgoLCD: Egocentric Video Generation with Long Context Diffusion

논문은 장기적으로 일관된 1인칭 시점(egocentric) 비디오를 생성하는 데 있어 콘텐츠 드리프트(content drift) 와 계산 자원 제약으로 인한 장기 기억(long-term memory) 관리의 어려움 을 해결하고자 합니다.

#Review #Egocentric Video Generation #Long-Context Diffusion #Long-Short Memory #Sparse KV Cache #Memory Regulation Loss #Structured Narrative Prompting #World Models #Embodied AI

2025년 12월 4일

[논문리뷰] Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

본 연구는 오디오-비디오 공동 노이즈 제거 훈련이 비디오 품질에만 중점을 둘 때도 비디오 생성 성능을 향상시키는 근본적인 질문에 답하는 것을 목표로 합니다.

#Review #Video Generation #Audio-Video Multimodal #Joint Denoising #Diffusion Models #Transformer Architecture #World Models #Physical Commonsense #Multimodal Training

2025년 12월 2일

[논문리뷰] GigaWorld-0: World Models as Data Engine to Empower Embodied AI

본 논문은 GigaWorld-0 라는 통합 월드 모델 프레임워크를 개발하여 Embodied AI 를 위한 확장 가능하고 데이터 효율적인 데이터 엔진 으로 활용하는 것을 목표로 합니다.

#Review #World Models #Embodied AI #Data Generation #Video Generation #3D Scene Reconstruction #Robotics #Vision-Language-Action

2025년 11월 25일

[논문리뷰] Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets?

본 논문은 최신 세계 모델(World Models, WMs)이 텍스트로 지정된 암묵적인 의미론적 목표를 가진 길 없는 경로 계획(mapless path planning) 작업을 실제 환경에서 얼마나 잘 수행하는지 정량적으로 평가하는 것을 목표로 합니다.

#Review #World Models #Mapless Navigation #Semantic Path Planning #Robot Learning #Video Prediction #Benchmark #Trajectory Generation

2025년 11월 24일

[논문리뷰] SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Vision-Language-Action (VLA) 모델의 강화 학습(RL)에서 발생하는 심각한 보상 희소성 문제 를 해결하고, 외부 전문가 시연이나 수동적인 보상 엔지니어링 없이 높은 훈련 효율성 과 일반화 능력 을 달성하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Vision-Language-Action Models #Reward Shaping #World Models #Self-Referential Learning #Robotics #Trajectory Optimization

2025년 11월 20일

[논문리뷰] Simulating the Visual World with Artificial Intelligence: A Roadmap

본 논문은 비디오 생성 모델이 포괄적인 물리적 세계 모델(Physical World Model) 로 진화하는 과정을 체계적으로 조망하고 로드맵을 제시하는 것을 목표로 합니다.

#Review #World Models #Video Generation #AI Simulation #Generative AI #Physical Plausibility #Interactive AI #Planning #Roadmap

2025년 11월 16일

[논문리뷰] WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

VLA 모델이 로봇 조작에 큰 잠재력을 보이지만, 전문가 데모에 의존하여 실패로부터 학습하고 스스로 수정하는 능력이 제한적이라는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Reinforcement Learning (RL)#Model-based RL #World Models #Policy Optimization #Robotics #Sample Efficiency #Self-correction

2025년 11월 12일

[논문리뷰] 10 Open Challenges Steering the Future of Vision-Language-Action Models

본 논문은 Vision-Language-Action (VLA) 모델 의 개발과 광범위한 수용을 가속화하기 위해 현재 연구 분야에서 직면한 10가지 주요 개방형 과제를 식별하고 논의하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied AI #Robotics #Multimodal Perception #Cross-Robot Generalization #Hierarchical Planning #World Models #Robot Safety

2025년 11월 10일

[논문리뷰] Scaling Agent Learning via Experience Synthesis

대규모 언어 모델(LLM) 에이전트의 강화 학습(RL) 훈련이 직면한 높은 비용, 제한된 태스크 다양성, 불안정한 보상 신호, 복잡한 인프라와 같은 문제들을 해결하는 것을 목표로 합니다. 현실 환경 상호작용의 필요성을 줄이면서도 효과적이고 확장 가능한 RL 훈련을 가능하게 하는 통합 프레임워크를 제안합니다.

#Review #Reinforcement Learning #LLM Agents #Experience Synthesis #World Models #Curriculum Learning #Sim-to-Real Transfer #Web Agents

2025년 11월 9일

[논문리뷰] How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment

본 연구는 고위험 수술 도메인에서 심층적이고 전문화된 인과 지식이 필요한 상황에서, 최첨단 비디오 생성 모델(잠재적 월드 모델 )이 실제 세계를 시뮬레이션하는 능력을 평가하는 것을 목표로 합니다.

#Review #Video Generation #World Models #Surgical AI #Zero-shot Prediction #Expert Evaluation #Plausibility Gap #Medical Simulation

2025년 11월 9일

[논문리뷰] Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

본 논문은 세계 모델이 증강된 Vision-Language-Action (VLA) 모델에서 차세대 관측 및 액션 시퀀스를 공동으로 예측하는 데 내재된 모달리티 충돌 문제를 해결하고자 합니다.

#Review #Vision-Language-Action Models #World Models #Diffusion Models #Multimodal Learning #Robotics #Asynchronous Sampling #Diffusion Transformers

2025년 11월 9일

[논문리뷰] LongCat-Video Technical Report

본 논문은 효율적이고 고품질의 장시간 비디오 생성 에 중점을 둔 13.6B 파라미터 규모의 기반 비디오 생성 모델 LongCat-Video 를 제안합니다.

#Review #Video Generation #Diffusion Transformer #RLHF #Sparse Attention #Long Video Generation #Coarse-to-Fine Generation #Multi-task Learning #World Models

2025년 10월 28일

[논문리뷰] WorldGrow: Generating Infinite 3D World

논문은 무한히 확장 가능한(infinitely extendable) 3D 세계 를 일관된 기하학적 구조와 사실적인 외관으로 생성하는 핵심 과제를 해결하고자 합니다.

#Review #3D World Generation #Infinite Scene Synthesis #Block-wise Generation #Coarse-to-Fine #3D Inpainting #Structured Latent Representation #Virtual Environments #World Models

2025년 10월 27일

[논문리뷰] PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis

제한된 실제 비디오 데이터로부터 변형 가능한 물체의 물리 일관성 있는 동역학 모델을 학습하는 데 따르는 데이터 부족 문제를 해결하고, 정확하면서도 빠른 추론이 가능한 월드 모델을 구축하는 것을 목표로 합니다. 특히, 시공간적으로 변이하는 물리적 특성을 가진 물체에 대한 모델링을 중점적으로 다룹니다.

#Review #World Models #Deformable Objects #Physics Simulation #GNN #Digital Twin #Data Synthesis #Real-to-Sim #Physics-Aware Learning

2025년 10월 27일

[논문리뷰] From Masks to Worlds: A Hitchhiker's Guide to World Models

이 논문은 '진정한 월드 모델'을 구축하기 위한 명확한 로드맵을 제시하며, 단순한 모델 목록을 나열하는 것을 넘어선다.

#Review #World Models #Generative AI #Multimodal Learning #Masked Modeling #Interactive AI #Memory Systems #Autonomous Agents #AI Roadmap

2025년 10월 24일

[논문리뷰] OmniNWM: Omniscient Driving Navigation World Models

본 논문은 기존 자율주행 월드 모델이 가진 제한된 상태 모달리티, 짧은 시퀀스 길이, 부정확한 액션 제어, 보상 인식 부족 등의 문제를 해결하여, 자율주행을 위한 종합적이고 전지적인(omniscient) 파노라마 내비게이션 월드 모델 을 개발하는 것을 목표로 합니다.

#Review #Autonomous Driving #World Models #Multi-modal Generation #3D Occupancy #Plücker Ray-maps #Action Control #Dense Rewards #Long-term Forecasting

2025년 10월 23일

[논문리뷰] World-in-World: World Models in a Closed-Loop World

본 논문은 기존 세계 모델(World Models, WM) 평가 프로토콜이 시각적 품질에만 치중하여 실제 환경에 대한 embodied agent의 태스크 성공 여부 를 제대로 측정하지 못하는 문제를 해결하고자 합니다.

#Review #World Models #Embodied AI #Closed-Loop Evaluation #Online Planning #Data Scaling #Controllability #Robotic Manipulation

2025년 10월 22일

[논문리뷰] LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

본 논문은 디지털 에이전트 훈련에 필요한 대규모, 고품질 UI 환경 훈련 궤적 데이터의 부족 문제 를 해결하고자 합니다. 기존 데이터 수집 방식의 높은 비용과 확장성 한계를 극복하기 위해, LLM 기반 시뮬레이터 를 활용하여 다양한 UI 상태와 전환을 합성하는 확장 가능한 패러다임을 제안하는 것을 목표로 합니다.

#Review #LLM #Digital Agents #UI Simulation #Synthetic Data Generation #Targeted Data Synthesis #World Models

2025년 10월 17일

[논문리뷰] PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

본 논문은 최신 비디오 생성 모델들이 시각적으로 사실적인 비디오를 생성하지만 물리 법칙을 준수하지 못하는 문제를 해결하는 것을 목표로 합니다. 물리적 지식을 비디오 생성 모델에 통합하여 물리적으로 그럴듯한 비디오 를 생성하고, 모델을 단순한 콘텐츠 생성기에서 '월드 모델' 로 발전시키는 것을 궁극적인 목적으로 합니다.

#Review #Video Generation #Physical Plausibility #Reinforcement Learning #Direct Preference Optimization #Physical Representation #Diffusion Models #World Models #Image-to-Video

2025년 10월 16일

[논문리뷰] CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

본 논문은 모방 학습(IL)에만 의존하는 자율주행 모델이 겪는 일반화 성능 저하 및 롱테일 시나리오 대응 문제 를 해결하고자 합니다. 또한, 강화 학습(RL)의 샘플 비효율성 및 불안정한 수렴 문제를 극복하기 위해, IL과 RL을 효과적으로 통합 하여 보다 견고하고 일반화된 자율주행 정책을 개발하는 것을 목표로 합니다.

#Review #Autonomous Driving #Imitation Learning #Reinforcement Learning #World Models #Latent Space #Dual-Policy #Competitive Learning

2025년 10월 16일

[논문리뷰] Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

AI 에이전트가 복잡하고 장기적인 대화형 태스크에서 '대리 시행착오(vicarious trial and error)' 능력을 통해 현재의 한계를 극복하고, 환경을 mentally simulate하여 추론 및 의사결정 성능을 향상시키는 것을 목표로 합니다.

#Review #AI Agents #Reinforcement Learning #World Models #Simulation #Reasoning #Language Models #Planning #Interactive AI

2025년 10월 13일

[논문리뷰] VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

본 논문은 모방 학습의 한계점(오류 누적, 분포 변화에 대한 낮은 강건성)과 기존 강화 학습(고비용, sim-to-real 간극)의 단점을 극복하고자 합니다.

#Review #Vision-Language-Action Models #Reinforcement Learning #World Models #Fine-tuning #Embodied AI #Robotics #Reward Design #Distribution Shift

2025년 10월 2일