#Video Diffusion

15개의 포스트

[논문리뷰] 4D Human-Scene Reconstruction from Low-Overlap Captures

본 논문은 소수의 low-overlap 카메라만으로도 고품질의 4D 인간-장면 복원(Human-Scene Reconstruction)을 구현하는 문제를 해결합니다.

#Review #4D Reconstruction #Gaussian Splatting #Sparse-view #Video Diffusion #Human-Scene Decomposition #Multi-view Pose Estimation

2026년 7월 13일

[논문리뷰] ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

기존의 인터랙티브 월드 모델은 주로 이동(locomotion)과 뷰포인트 제어에 집중되어 있어, 실질적인 객체 상호작용을 지원하지 못하는 한계가 있습니다 . 이러한 '네비게이션-상호작용 격차'는 크게 두 가지 병목 현상에서 기인합니다.

#Review #World Model #Interactive Generation #Action-Aware Memory #Chunk-Autoregressive #Video Diffusion #Embodied AI #Human-Object Interaction

2026년 6월 16일

[논문리뷰] Echo-Memory: A Controlled Study of Memory in Action World Models

본 논문은 Action World Models에서 발생하는 근본적인 Memory 실패 문제를 해결하기 위해 연구를 시작했다 . 기존의 연구들은 서로 다른 Backbone, Training recipe, Evaluation protocol을 사용하여 메모리 성능을 정확하게 비교하는 것이 불가능했습니다.

#Review #Action World Models #Video Diffusion #Memory Mechanism #Open-domain Return #Replay Consistency #State-Space Memory #Context Compression

2026년 6월 8일

[논문리뷰] VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

본 논문은 autoregressive 영상 확산 모델에서 streaming 생성 시 발생하는 방대한 KV 캐시 메모리 비용 문제를 해결하고자 합니다.

#Review #Video Diffusion #Multi-Head Latent Attention #KV Cache #Autoregressive Generation #Low-Rank Latent #Streaming Video #3D-RoPE

2026년 6월 1일

[논문리뷰] LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

본 논문은 Autoregressive(AR) 비디오 생성 모델에서 장기 생성 시 발생하는 오류 누적과 identity drift 문제를 해결하고자 합니다. 기존 방식은 효율성을 위해 Sliding-window Attention에만 의존하며, 생성된 초기 Latent를 폐기하거나 고정된 앵커(anchor)만을 사용합니다 .

#Review #Long Video Generation #Autoregressive #Retrieval-Augmented Generation #Video Diffusion #Temporal Consistency #Attention

2026년 6월 1일

[논문리뷰] MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation

전통적인 CG (Computer Graphics) 파이프라인에서 동물의 fur와 muscle dynamics를 시뮬레이션하는 작업은 고도의 전문성과 막대한 컴퓨팅 자원을 요구하는 노동 집약적인 과정입니다.

#Review #Video Diffusion #Animal Fur Simulation #Muscle Dynamics #Generative Dynamics Solver #Role-Aware RoPE #Asymmetric Decoupled Attention

2026년 5월 28일

[논문리뷰] Bernini: Latent Semantic Planning for Video Diffusion

본 논문은 현대의 MLLM과 영상 확산 모델(Diffusion Model)이 각각 고도의 추론 능력과 사실적 합성 능력을 갖추고 있음에도 불구하고, 이들을 효과적으로 통합하는 프레임워크가 부족하다는 점에 주목합니다.

#Review #Video Diffusion #Multimodal Large Language Models #Latent Semantic Planning #Diffusion Transformer #Video Editing #Chain-of-Thought

2026년 5월 21일

[논문리뷰] Speculative Decoding for Autoregressive Video Generation

본 논문은 이미지 품질 라우터를 사용하여 블록별로 드래프트된 결과물을 수락하거나 타겟 모델로 재생성하는 SDVG 프레임워크를 제안합니다. 드래프터는 4번의 Denoising step을 통해 후보 블록을 생성하며, 이는 Worst-frame aggregation을 통해 ImageReward로 평가됩니다 .

#Review #Speculative Decoding #Autoregressive Video Generation #Video Diffusion #Training-free #ImageReward

2026년 4월 21일

[논문리뷰] AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing

최근의 생성형 비디오 모델들은 자율주행을 위한 악천후 시나리오 합성에 탁월한 성능을 보이지만, 희귀한 기상 상황을 학습하기 위해 방대한 데이터셋을 필요로 한다는 한계가 있습니다.

#Review #Autonomous Driving #Weather Synthesis #G-buffer #3D-aware Editing #Neural Rendering #Video Diffusion #Relighting

2026년 3월 31일

[논문리뷰] VideoWorld 2: Learning Transferable Knowledge from Real-world Videos

본 연구는 복잡하고 장기적인 태스크를 위해 레이블이 없는 실세계 비디오 데이터 로부터 전이 가능한 지식을 학습하는 것을 목표로 합니다.

#Review #Transferable Knowledge #Real-world Video Learning #Latent Dynamics Model #Video Diffusion #Robotics Manipulation #Long-horizon Tasks #Unlabeled Data

2026년 2월 10일

[논문리뷰] KlingAvatar 2.0 Technical Report

본 연구는 장시간 고해상도 아바타 비디오 생성 시 발생하는 효율성 부족, 시간적 드리프트, 품질 저하, 프롬프트 불일치 문제를 해결하는 것을 목표로 합니다.

#Review #Avatar Generation #Video Diffusion #Multi-modal LLM #Long-duration Video #High-resolution Video #Lip Synchronization #Multi-character Control #Spatio-temporal Cascade

2025년 12월 15일

[논문리뷰] VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models

본 논문은 조잡한(coarse) 3D 지오메트리, 카메라 궤적, 그리고 참조 이미지를 사용하여 고품질 3D 장면 비디오를 생성하는 문제를 해결하고자 합니다.

#Review #3D Scene Generation #Video Diffusion #Image Diffusion #Generative Models #Computer Graphics #Temporal Consistency #Sparse Anchor Views

2025년 9월 23일

[논문리뷰] Dress&Dance: Dress up and Dance as You Like It - Technical Preview

본 논문은 정적인 2D 이미지 기반의 가상 착용(virtual try-on) 방식과 기존 비디오 생성 모델의 한계를 극복하여, 사용자가 원하는 옷을 입고 특정 동작(춤)을 수행하는 고품질의 5초 길이, 1152x720 해상도, 24 FPS 가상 착용 비디오를 생성하는 것을 목표로 합니다.

#Review #Virtual Try-On #Video Diffusion #Multi-modal Conditioning #Garment Transfer #Pose Animation #Generative AI #Fashion Tech #CondNet

2025년 8월 29일

[논문리뷰] DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

본 논문은 기존 방법론의 한계인 긴 시간 동안의 비디오 생성에 대한 과도한 계산 요구, 3D 표현 없는 장기 비디오 합성 집중, 또는 정적 단일 장면 재구성에 대한 제약을 해결합니다.

#Review #Driving Scene Generation #Video Diffusion #3D Reconstruction #Gaussian Splatting #Feed-Forward Models #Temporal Coherence #Multimodal Control

2025년 10월 20일

[논문리뷰] MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

본 연구는 기존 이미지-투-비디오(Image-to-Video) 생성 모델이 시각적 충실도는 높지만, 물리적으로 그럴듯하고 의미론적으로 일관된 동작을 생성하는 데 어려움을 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #Image-to-Video Generation #Motion Transfer #Retrieval-Augmented Generation (RAG)#In-Context Learning #Diffusion Models #Video Diffusion #Motion Realism

2025년 10월 1일