#Autoregressive Generation

35개의 포스트

[논문리뷰] Vidu S1: A Real-Time Interactive Video Generation Model

본 논문은 기존의 오프라인 생성 패러다임이 가진 상호작용성 부재와 실시간 응답성 결여 문제를 해결하기 위해 Vidu S1을 제안합니다. 대부분의 기존 비디오 생성 모델은 전체 프레임을 한 번에 생성하는 one-shot 방식에 의존하여, 사용자가 생성 과정에 실시간으로 개입할 수 없는 한계가 있습니다.

#Review #Real-time Video Generation #Speech-Guided Control #Infinite-Length Inference #TurboDiffusion #TurboServe #Autoregressive Generation

2026년 7월 9일

[논문리뷰] Flex-Forcing: Towards a Unified Autoregressive and Bidirectional Video Diffusion Model

기존의 비디오 생성 모델은 Bidirectional diffusion과 Autoregressive 모델이라는 두 개의 분리된 패러다임으로 나뉘어 있어, 각각의 장단점이 뚜렷하다는 한계가 있습니다.

#Review #Video Diffusion Models #Autoregressive Generation #Bidirectional Generation #Flexible Chunking #Denoising Timesteps #KV Caching #Any-order Editing

2026년 7월 7일

[논문리뷰] LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

본 논문은 실시간 스트리밍 비디오 편집 환경에서 발생하는 Attention distribution shift와 Spatial-temporal token redundancy 문제를 해결하고자 한다 .

#Review #Streaming Video Editing #Diffusion Models #Distillation #Real-Time Inference #Attention Distribution #Mask Cache #Autoregressive Generation

2026년 6월 29일

[논문리뷰] Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

본 논문은 픽셀 공간에서 직접 수행되는 Autoregressive 이미지 생성의 낮은 효율성과 오차 누적 문제를 해결하는 것을 목적으로 합니다.

#Review #Autoregressive Generation #Pixel-Space #Parallel Rollout Approximation #Continuous-Token #Diffusion Head #Intermediate States #Train-Inference Mismatch

2026년 6월 28일

[논문리뷰] Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

본 논문은 기존 3D 에셋이 물리 시뮬레이션에 필수적인 재질 정보(Young's modulus, Poisson's ratio, density)를 결여하고 있어, 사실적인 물리 시뮬레이션 구현에 병목 현상을 겪고 있다는 문제를 해결합니다 .

#Review #Mechanical Properties #Sparse Adaptive Voxels #Physics Simulation #Autoregressive Generation #3D Assets #Material Fields

2026년 6월 18일

[논문리뷰] MotionVLA: Vision-Language-Action Model for Humanoid Motion

본 논문은 기존의 단일 코드북 기반 모션 토큰화가 저주파 포즈 정보에 편향되어 고주파 물리적 역학을 제대로 표현하지 못하는 문제를 해결하고자 합니다. 대다수 연구들은 움직임을 하나의 시퀀스로 통합하여 이산화하는데, 이는 관절 위치(저주파)와 속도(고주파)의 상이한 통계적 특성을 무시하게 만듭니다.

#Review #Vision-Language-Action #Humanoid Motion #Frequency-Domain Tokenizer #Autoregressive Generation #Dual-Stream Representation #MotionVLA

2026년 6월 16일

[논문리뷰] Memento: Reconstruct to Remember for Consistent Long Video Generation

본 논문은 장기 비디오 생성 시 인물의 정체성이 시간이 지남에 따라 왜곡되거나 소실되는 문제를 해결하고자 한다 . 기존의 시간적 분해(Temporal Decomposition) 기반 모델들은 차기 샷(Next-shot)의 시각적 연속성만을 최적화할 뿐, 인물의 정체성을 보존하기 위한 명시적 신호가 부족하다는 한계가 있다.

#Review #Long Video Generation #Subject Consistency #Diffusion Models #Memory Bank #Identity Grounding #Autoregressive Generation

2026년 6월 15일

[논문리뷰] BadWorld: Adversarial Attacks on World Models

본 논문은 VWMs의 잠재적 취약성을 평가하기 위한 최초의 적대적 공격 프레임워크인 BadWorld를 제안합니다.

#Review #Adversarial Attack #Visual World Models #Autoregressive Generation #Flow Matching #Trajectory-Adaptive Optimization #Label-Free

2026년 6월 15일

[논문리뷰] IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

본 논문은 VFM 기반의 RAE가 재구성 품질과 의미 보존 사이에서 겪는 근본적인 병목 현상을 해결하고자 합니다. 기존 연구들은 주로 깊은 계층의 의미론적 정보에만 의존하는데, 이는 디테일한 시각적 속성(색상, 텍스트, 로컬 구조 등)을 소실시키는 결과를 초래합니다.

#Review #Representation Autoencoder #Vision Foundation Models #Vector Quantization #Autoregressive Generation #Semantic Preservation #Reconstruction Fidelity

2026년 6월 11일

[논문리뷰] Next Forcing: Causal World Modeling with Multi-Chunk Prediction

본 논문은 기존 Autoregressive 모델이 긴 시퀀스를 생성할 때 발생하는 높은 Latency와 연산 비효율성 문제를 해결한다. 전통적인 모델은 토큰을 하나씩 생성해야 하므로, 복잡한 환경을 시뮬레이션하거나 긴 문맥을 생성할 때 병목 현상이 발생한다.

#Review #World Modeling #Multi-Chunk Prediction #Causal Modeling #Autoregressive Generation #Sequence Modeling

2026년 6월 9일

[논문리뷰] FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

본 논문은 Autoregressive Video Diffusion 모델에서 장기 문맥(Long-term context) 유지가 어려워 발생하는 비디오의 시간적 붕괴 문제를 해결합니다.

#Review #Video Diffusion Models #Memory Consolidation #Autoregressive Generation #Temporal Consistency #Long-term Dependency

2026년 6월 9일

[논문리뷰] VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

본 논문은 autoregressive 영상 확산 모델에서 streaming 생성 시 발생하는 방대한 KV 캐시 메모리 비용 문제를 해결하고자 합니다.

#Review #Video Diffusion #Multi-Head Latent Attention #KV Cache #Autoregressive Generation #Low-Rank Latent #Streaming Video #3D-RoPE

2026년 6월 1일

[논문리뷰] Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

본 논문은 interactive video world model의 장기 생성 시 발생하는 과도한 연산 비용과 추론 지연 문제를 해결하기 위해 Light Interaction을 제안한다.

#Review #Interactive Video World Models #Inference Acceleration #Adaptive Context Management #Denoising Cache Acceleration #3D Sparse Attention #Autoregressive Generation

2026년 5월 31일

[논문리뷰] Channel-wise Vector Quantization

본 연구는 기존 Vector Quantization (VQ) 기반 이미지 tokenization 및 autoregressive 생성 방식의 근본적인 한계점을 해결하고자 합니다.

#Review #Channel-wise Vector Quantization #Autoregressive Generation #Next-Channel Prediction #Codebook Utilization #Visual Tokenization #Image Reconstruction #Text-to-Image Generation #Nested Channel Dropout

2026년 5월 25일

[논문리뷰] Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

본 논문은 Foundation video generation 모델을 활용하여 학습 없이 무한한 길이의 영상을 일관성 있게 생성하는 것을 목표로 합니다.

#Review #Long Video Generation #Train-Free #Autoregressive Generation #Consistency Enhancement #Diffusion Models #Test-Time Scaling #Temporal Consistency

2026년 5월 20일

[논문리뷰] FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

본 논문은 실시간 인터랙티브 가먼트 교체와 비디오 생성을 동시에 달성하기 어려운 기존의 한계를 해결하고자 합니다. 기존의 subject-to-video(S2V) 방식은 주로 identity 보존에만 집중하고 있어, 패션 산업이나 콘텐츠 생성에서 요구되는 실시간이고 유연한 가먼트 제어 능력이 부족합니다.

#Review #Video Customization #Garment Switching #Autoregressive Generation #In-Context Learning #Streaming Distillation #KV Cache Rescheduling #Real-Time Inference

2026년 5월 17일

[논문리뷰] ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

본 논문은 기존 Visual Reasoning 기법들이 직면한 연산 효율성 및 아키텍처 호환성 문제를 해결하고자 합니다.

#Review #Visual Reasoning #Functional Token #LA-GRPO #Autoregressive Generation #Multimodal LLM #Agentic Reasoning

2026년 5월 14일

[논문리뷰] Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

본 논문은 Autoregressive LVLM이 긴 문맥 생성 시 겪는 Visual Signal Dilution 문제를 해결하고자 한다.

#Review #Large Vision-Language Models #Visual Signal Dilution #Persistent Visual Memory #Autoregressive Generation #Multimodal Reasoning #Bottleneck Adapter

2026년 5월 4일

[논문리뷰] AvatarPointillist: AutoRegressive 4D Gaussian Avatarization

본 논문은 기존의 일회성(one-shot) 아바타 생성 방식이 가진 경직된 위상(fixed topology) 문제를 해결하고자 합니다.

#Review #4D Gaussian Avatar #Autoregressive Generation #Transformer #3D Gaussian Splatting #One-shot Generation #Identity-preserving

2026년 4월 6일

[논문리뷰] LoST: Level of Semantics Tokenization for 3D Shapes

최근 autoregressive (AR) 모델이 3D generation의 강력한 패러다임으로 부상하고 있지만, 3D shape에 대한 최적의 tokenization 방법은 여전히 미해결 과제입니다.

#Review #3D Shape Tokenization #Semantic Salience #Autoregressive Generation #Relational Inter-Distance Alignment #Diffusion Models #Triplane

2026년 3월 18일

[논문리뷰] WorldCompass: Reinforcement Learning for Long-Horizon World Models

본 논문은 상호작용적 비디오 기반 세계 모델(world models)의 장기적인 탐색 정확도와 일관성을 향상시키기 위해, 강화 학습(RL) 기반의 후처리 훈련 프레임워크인 WorldCompass 를 제안합니다.

#Review #Reinforcement Learning #World Models #Video Generation #Autoregressive Generation #Long-Horizon #Post-training #Diffusion Models #Reward Functions

2026년 2월 9일

[논문리뷰] Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

본 논문의 핵심 목표는 LLM의 추론 성능을 향상시키는 데 사용되는 강화 학습(RL) 기반 후처리 및 MCMC(Markov Chain Monte Carlo) 기반 파워 샘플링 의 높은 계산 비용 문제를 해결하는 것입니다.

#Review #LLM Reasoning #Distribution Sharpening #Power Sampling #Training-Free #Monte Carlo Estimation #Jackknife Correction #Autoregressive Generation #Inference Efficiency

2026년 1월 29일

[논문리뷰] Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

본 논문은 Vision-Language Model (VLM)의 autoregressive 생성 과정에서 모든 토큰이 모델 불안정성에 동일하게 기여한다는 기존 가정에 도전합니다.

#Review #Vision-Language Models #Adversarial Attacks #Entropy-Guided Attacks #Token Vulnerability #Harmful Content #Cross-Model Transferability #Autoregressive Generation

2026년 1월 8일

[논문리뷰] VA-π: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

본 논문은 Autoregressive (AR) 시각 생성 모델이 토큰 수준에서만 최적화되어 픽셀 공간에서 낮은 품질의 이미지를 생성하는 문제를 해결하고자 합니다.

#Review #Autoregressive Generation #Pixel-Aware Alignment #Variational Optimization #Reinforcement Learning #Visual Tokenizers #Image Quality #ELBO #Post-Training Framework

2025년 12월 25일

[논문리뷰] WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

논문은 단일 이미지로부터 장범위(long-range) 및 기하학적으로 일관된 새로운 시점 비디오를 생성하는 근본적인 문제를 해결하고자 합니다.

#Review #Novel View Synthesis #3D Geometry Propagation #Video Diffusion Models #Gaussian Splatting #Autoregressive Generation #Spatio-Temporal Noise #Geometric Consistency

2025년 12월 22일

[논문리뷰] FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering

본 논문은 인터랙티브 애플리케이션을 위한 G-buffer 조건부 신경망 포워드 프레임 렌더링에서 시간적 일관성 을 유지하는 동시에 사실적인 이미지를 프레임별로 자동회귀적으로 생성 하는 문제를 해결하는 것을 목표로 합니다. 기존 단일 이미지 모델의 시간적 불일치 와 비디오 모델의 높은 연산 비용 문제를 극복하고자 합니다.

#Review #Neural Rendering #Diffusion Models #G-Buffer #Autoregressive Generation #Temporal Consistency #ControlNet #ControlLoRA #Interactive Applications

2025년 12월 18일

[논문리뷰] OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

이 논문은 기존 다중 샷 비디오 생성(MSV) 모델이 복잡한 서사에 필요한 장거리 샷 간 컨텍스트를 효과적으로 모델링하지 못하여 발생하는 시각적 불일치와 일관성 저하 문제를 해결하는 것을 목표로 합니다.

#Review #Multi-Shot Video Generation #Adaptive Memory #Long-Range Context #Frame Selection #Diffusion Models #Image-to-Video #Autoregressive Generation #Narrative Coherence

2025년 12월 9일

[논문리뷰] Rethinking Training Dynamics in Scale-wise Autoregressive Generation

본 연구는 스케일별 자동회귀(AR) 생성 모델이 겪는 (1) 훈련-추론 불일치(exposure bias) 와 (2) 스케일별 학습 난이도 불균형 문제로 인해 저하되는 생성 품질을 해결하는 것을 목표로 합니다.

#Review #Autoregressive Generation #Visual Synthesis #Exposure Bias #Student Forcing #Self-Autoregressive Refinement #Scale-wise Prediction #Image Generation

2025년 12월 8일

[논문리뷰] MotionStream: Real-Time Video Generation with Interactive Motion Controls

기존 모션 제어 비디오 생성 모델의 높은 지연 시간(수분 소요) 과 비인과적 처리 문제로 인한 실시간 상호작용 불가능성을 해결하고, 대화형 모션 제어 를 통해 실시간으로 무한 길이의 비디오 스트리밍 생성 을 가능하게 하는 새로운 프레임워크를 제안하는 것입니다.

#Review #Real-Time Video Generation #Motion Control #Diffusion Models #Autoregressive Generation #Self-Forcing #Attention Sink #Streaming Inference #Video Distillation

2025년 11월 9일

[논문리뷰] Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

본 논문은 기존 오픈소스 Diffusion Large Language Models (dLLMs)가 Autoregressive (AR) LLMs에 비해 추론 속도에서 우위를 점하지 못하는 문제를 해결하는 것을 목표로 합니다.

#Review #Diffusion LLMs #Faster Inference #Discrete Diffusion Forcing (D2F)#Autoregressive Generation #KV Cache Optimization #Parallel Decoding #Text Generation #Model Distillation

2025년 8월 14일

[논문리뷰] LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

본 논문은 기존 비디오 생성 모델이 짧은 클립에는 효과적이지만, 시간적 불일치(temporal inconsistency) 와 시각적 품질 저하(visual degradation) 문제로 인해 1분 이상의 초장시간 비디오 생성 에 어려움을 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #Ultra-long Video Generation #Multimodal Guidance #Controllable Video Generation #Diffusion Models #Temporal Consistency #Visual Quality #Autoregressive Generation #Degradation-aware Training

2025년 8월 6일

[논문리뷰] Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

Multimodal Large Language Models (MLLMs)가 복잡한 시각적 계획과 상상력을 요구하는 시나리오에서 겪는 어려움을 해결하고, MLLM에 내부 시각적 스크래치패드(visual scratchpad) 를 부여하여 시각적 사고(visual thought) 를 통해 멀티모달 추론 능력을 향상시키는 것을 목표로 합니다.

#Review #Multimodal LLMs #Visual Reasoning #Latent Space #Sketch Generation #Visual Thinking #Autoregressive Generation #Interpretability

2025년 10월 29일

[논문리뷰] ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

본 논문은 기존 MLLM 기반 분할 방법론이 픽셀 수준의 미세한 시각적 디테일을 포착하는 데 한계가 있음을 지적하며, Autoregressive Generation 기반의 새로운 패러다임인 ARGenSeg 를 제안합니다.

#Review #Image Segmentation #Autoregressive Generation #Multimodal Large Language Models (MLLMs)#Visual Understanding #VQ-VAE #Multi-scale Prediction #Referring Expression Segmentation #Image Generation

2025년 10월 24일

[논문리뷰] Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs

기존 MLLM이 시각 작업을 위해 텍스트로 좌표를 생성하는 등 간접적인 표현 방식 에 의존하여 성능이 제한되고 분할(Segmentation)과 같은 밀집 예측(Dense Prediction) 작업 이 어려웠던 문제를 해결하는 것입니다.

#Review #Multimodal Large Language Models (MLLMs)#Visual Reference Tokens (VRTs)#Dense Prediction #Referring Expression Comprehension (REC)#Open-Vocabulary Detection (OVD)#Image Captioning #Unified Architecture #Autoregressive Generation

2025년 10월 9일

[논문리뷰] Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

기존 autoregressive 시각 모델에서 이산 잠재 공간 토크나이저 의 양자화 오류가 의미 표현력과 시각-언어 이해 능력을 저해하는 문제를 해결하고자 합니다.

#Review #Unified Vision-Language Model #Continuous Tokenizer #Autoregressive Generation #Image Understanding #Image Generation #Multimodal AI #In-context Editing

2025년 10월 9일