#Autoregressive Models

36개의 포스트

[논문리뷰] VideoRAE: Taming Video Foundation Models for Generative Modeling via Representation Autoencoders

본 논문은 기존의 3D-VAE 기반 비디오 토크나이저가 픽셀 단위의 복원(MSE)에 과도하게 최적화되어 고차원적인 의미론적 구조를 포착하지 못한다는 문제점을 해결하고자 합니다.

#Review #Video Foundation Models #Representation Autoencoders #Generative Modeling #Representation Alignment #Latent Spaces #Diffusion Transformers #Autoregressive Models

2026년 7월 19일

[논문리뷰] Streaming Autoregressive Video Generation via Diagonal Distillation

대규모 확산 모델의 제한된 실시간 스트리밍 기능을 개선하고, 기존 자기회귀 모델의 높은 연산 비용으로 인한 낮은 품질 문제를 해결하는 것이 목표입니다.

#Review #Video Generation #Autoregressive Models #Diffusion Models #Distillation #Real-time #Streaming #Temporal Coherence #Flow Matching

2026년 3월 10일

[논문리뷰] PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

본 연구는 단일 RGB 이미지로부터 완전한 3D 실내 장면의 메쉬를 자동회귀 방식으로 재구성하는 것을 목표로 합니다.

#Review #Single-View 3D Reconstruction #Autoregressive Models #Mesh Generation #Scene Understanding #Transformer #Point Cloud Features #Pose Estimation

2026년 3월 8일

[논문리뷰] Helios: Real Real-Time Long Video Generation Model

논문은 단일 NVIDIA H100 GPU 에서 19.5 FPS 로 실시간 분 단위 비디오를 생성하고, 기존의 안티-드리프팅(anti-drifting) 휴리스틱이나 가속화 기술 없이도 강력한 품질을 유지하는 최초의 14B 비디오 생성 모델 인 Helios를 개발하는 것을 목표로 합니다.

#Review #Video Generation #Real-Time #Long Video #Diffusion Transformers #Anti-Drifting #Memory Optimization #Distillation #Autoregressive Models

2026년 3월 4일

[논문리뷰] Spilled Energy in Large Language Models

본 논문은 대규모 언어 모델(LLM)에서 발생하는 환각(hallucination) 을 추가적인 훈련 없이 효과적으로 탐지하는 것을 목표로 합니다.

#Review #LLM Hallucination Detection #Energy-Based Models #Training-Free #Logit Analysis #Spilled Energy #Cross-Task Generalization #Autoregressive Models

2026년 3월 3일

[논문리뷰] Causal Motion Diffusion Models for Autoregressive Motion Generation

본 논문은 기존 모션 확산 모델의 인과성 부족과 자기회귀 모델의 불안정성 및 오류 누적 문제를 해결하여, 고품질의 시간적으로 순서가 보장되는(temporally ordered) 모션 생성을 목표로 합니다.

#Review #Motion Generation #Diffusion Models #Autoregressive Models #Causal Modeling #Latent Space #Text-to-Motion #Human Motion Synthesis #Streaming Generation

2026년 2월 26일

[논문리뷰] BitDance: Scaling Autoregressive Generative Models with Binary Tokens

본 논문은 기존 Autoregressive (AR) 모델의 제한된 토큰 표현력과 비효율적인 샘플링 문제를 해결하여, 고품질 이미지 생성을 위한 확장 가능한 AR 프레임워크인 BitDance 를 제안합니다.

#Review #Autoregressive Models #Binary Tokens #Diffusion Head #Image Generation #Tokenizer #Parallel Prediction #High-Resolution

2026년 2월 16일

[논문리뷰] MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

기존 오디오 토크나이저의 사전 학습된 인코더 , 의미론적 증류 , 이질적인 CNN 기반 아키텍처 의존성으로 인한 재구성 충실도 및 확장성 한계를 극복하는 것이 목표입니다.

#Review #Audio Tokenizer #Transformer Architecture #End-to-End Learning #Residual Vector Quantization #Speech Synthesis #Audio Foundation Models #Scalability #Autoregressive Models

2026년 2월 12일

[논문리뷰] Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

본 연구는 오토회귀(Autoregressive) 이미지 생성 모델 이 확산 손실(diffusion loss) 과 결합될 때 발생하는 '조건 불일치(condition inconsistency)' 문제를 해결하고, 이로 인해 누적되는 extraneous 정보가 패치 생성 품질을 저해하는 한계를 극복하는 것을 목표로 합니다.

#Review #Autoregressive Models #Diffusion Models #Image Generation #Condition Refinement #Optimal Transport #Wasserstein Gradient Flow #Score Matching #Patch Denoising

2026년 2월 10일

[논문리뷰] Context Forcing: Consistent Autoregressive Video Generation with Long Context

이 논문은 현재 자동회귀 비디오 생성 모델들이 짧은 컨텍스트 윈도우와 학생-교사 불일치로 인해 장기적인 일관성(forgetting-drifting dilemma)을 유지하기 어렵다는 문제를 해결하고자 합니다.

#Review #Video Generation #Autoregressive Models #Long Context #Temporal Consistency #Diffusion Models #Context Forcing #Memory Management #Distribution Matching Distillation

2026년 2월 5일

[논문리뷰] iFSQ: Improving FSQ for Image Generation with 1 Line of Code

이미지 생성 분야의 Autoregressive(AR) 모델과 Diffusion 모델 간의 단절을 해소하고, 이들을 위한 통일된 토크나이저를 구축 하는 것을 목표로 합니다.

#Review #Finite Scalar Quantization (FSQ)#Image Generation #Autoregressive Models #Diffusion Models #Quantization #Tokenization #Representation Alignment (REPA)#Latent Space

2026년 1월 26일

[논문리뷰] AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation

본 논문은 기존 멀티모달 대규모 언어 모델(MLLM)이 멀티모달 생성을 위해 외부 전문가 구성 요소(예: 확산 디코더)에 의존하는 한계를 극복하고자 합니다.

#Review #Autoregressive Models #Multimodal AI #Any-to-Any Generation #Unified Model #Speech Generation #Image Generation #Transformer Decoder #Real-time Streaming

2026년 1월 26일

[논문리뷰] The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

이 논문은 Diffusion Large Language Models (dLLMs)의 핵심 이점으로 여겨지는 임의 순서(arbitrary order) 생성 능력 이 실제 추론 잠재력을 제한한다는 역설적인 현상을 밝히고, dLLM의 추론 능력을 더 효과적으로 이끌어내기 위한 새로운 RL 방법론 을 제시하는 것을 목표로 합니다.

#Review #Diffusion Language Models #Reasoning #Reinforcement Learning #Autoregressive Models #Generation Order #Entropy Degradation #Pass@k #GRPO

2026년 1월 22일

[논문리뷰] VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

비디오 생성 분야에서 Diffusion 및 Flow-Matching 모델 의 높은 계산 비용과 확장성 문제를 해결하는 것을 목표로 합니다.

#Review #Video Generation #Autoregressive Models #Next-Frame Prediction #Multi-scale Prediction #Temporal Consistency #Visual Autoregressive #Error Propagation

2026년 1월 11일

[논문리뷰] Pretraining Frame Preservation in Autoregressive Video Memory Compression

본 논문은 오토회귀 비디오 생성 모델에서 발생하는 긴 비디오 컨텍스트 처리의 한계 와 컨텍스트 품질 및 길이 간의 트레이드오프 문제를 해결하고자 합니다.

#Review #Video Compression #Autoregressive Models #Memory Compression #Frame Preservation #Pretraining #Video Generation #Diffusion Models #Long-Range Consistency

2025년 12월 31일

[논문리뷰] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

본 논문은 기존 확산 비전 언어 모델(dVLMs)의 성능 저하와 가변 길이 생성 및 KV 캐시 재사용의 비효율성 문제를 해결하고자 합니다.

#Review #Diffusion Models #Vision Language Models #Autoregressive Models #Diffusion Finetuning #Block Diffusion #Multimodal AI #KV Cache

2025년 12월 17일

[논문리뷰] Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

텍스트-3D 자동회귀(autoregressive) 생성 모델에 강화 학습(RL) 을 체계적으로 적용하고 그 효과를 분석하는 것을 목표로 합니다. 특히, 3D 객체의 복잡한 기하학적 구조와 미세한 질감을 고려하여 보상 설계 와 RL 알고리즘 선택 이 3D 생성 성능에 미치는 영향을 심층적으로 탐구합니다.

#Review #Reinforcement Learning #Text-to-3D Generation #Autoregressive Models #Reward Modeling #Hierarchical RL #3D Benchmarking #ShapeLLM-Omni

2025년 12월 11일

[논문리뷰] From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs

본 논문은 순차적인 자동회귀(AR) LLM의 추론 병목 현상을 해결하고자 합니다.

#Review #Diffusion Language Models #LLM Adaptation #Block-Diffusion #Autoregressive Models #Attention Masks #Parallel Generation #Transfer Learning #Generative Models

2025년 12월 9일

[논문리뷰] Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

효율적인 스트리밍 비디오 생성 시 기존 방법론들이 정적 초기 토큰에 과도하게 의존하여 동적 움직임 저하와 '프레임 복사' 문제를 겪는 한계를 극복하고자 합니다. 본 연구는 실시간으로 높은 시각적 충실도와 강력한 움직임 역동성을 동시에 유지하는 비디오 생성을 목표로 합니다.

#Review #Streaming Video Generation #Video Diffusion Models #Distribution Matching Distillation #Reinforcement Learning #Autoregressive Models #Attention Sink #Real-time

2025년 12월 4일

[논문리뷰] RELIC: Interactive Video World Model with Long-Horizon Memory

논문은 실시간 장기 스트리밍, 일관된 공간 메모리, 정밀한 사용자 제어라는 세 가지 핵심 요소를 동시에 만족하는 상호작용 가능한 비디오 월드 모델 을 구축하는 것을 목표로 합니다. 기존 접근 방식들이 이 중 하나만을 다루거나, 장기 메모리 메커니즘이 실시간 성능을 저하시키는 문제를 해결하고자 합니다.

#Review #Interactive World Model #Video Generation #Long-Horizon Memory #Real-Time Streaming #Diffusion Models #Autoregressive Models #Spatial Consistency #Unreal Engine

2025년 12월 3일

[논문리뷰] Diffusion Language Models are Super Data Learners

본 논문은 고품질 데이터 희소성이 LLM 훈련의 주요 병목이 되는 시대에, Autoregressive (AR) 모델 과 Diffusion Language Models (DLMs) 중 어떤 패러다임이 제한된 고유 데이터로부터 더 많은 신호를 추출하는지 규명하는 것을 목표로 합니다.

#Review #Diffusion Language Models #Autoregressive Models #Data Efficiency #Scaling Laws #Data-Constrained Learning #Crossover Phenomenon #Pre-training #Masked Diffusion

2025년 11월 9일

[논문리뷰] LongLive: Real-time Interactive Long Video Generation

실시간 및 대화형으로 고품질의 긴 비디오를 생성하는 데 따르는 효율성, 일관성, 그리고 시맨틱 일관성 문제를 해결하는 것을 목표로 합니다. 특히, 프롬프트 전환 시 시각적 일관성과 동적 콘텐츠 생성을 위한 상호작용성 부족이라는 기존 AR 및 Diffusion 모델의 한계를 극복하고자 합니다.

#Review #Long Video Generation #Real-time #Interactive AI #Autoregressive Models #KV Cache #Streaming Tuning #Attention Sink #Diffusion Models

2025년 9월 29일

[논문리뷰] A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

로봇의 실세계 강화 학습(RL)에서 희소하고 수작업으로 제작된 보상 및 비효율적인 탐색 으로 인한 병목 현상을 해결하는 것을 목표로 합니다.

#Review #Robotics #Reinforcement Learning (RL)#Vision-Language-Action (VLA) Models #Reward Modeling #Human-in-the-Loop #Dense Rewards #Generalization #Autoregressive Models

2025년 9월 22일

[논문리뷰] Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

본 논문은 자연어 처리에서 성공적인 자기회귀(Autoregressive, AR) 모델이 이미지 생성 시 고수준 시각적 의미 학습에 어려움을 겪는 문제를 해결하고자 합니다.

#Review #Autoregressive Models #Image Generation #Self-Supervised Learning #Visual Understanding #Masked Image Modeling #Contrastive Learning #Next-Token Prediction #LlamaGen

2025년 9월 19일

[논문리뷰] Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing

본 연구는 시각적 자기회귀(VAR) 모델 에서 추가 훈련 없이 프롬프트 기반 이미지 편집 기능을 구현하는 것을 목표로 합니다. 기존 VAR 모델의 편집 능력 한계를 극복하고, 원본 이미지의 관련 없는 세부 사항을 보존하면서 텍스트 프롬프트에 따라 타겟 편집을 정확하고 제어 가능하게 수행하는 방법론을 개발하고자 합니다.

#Review #Image Editing #Autoregressive Models #Noise Inversion #Text-to-Image #Gumbel-max Trick #Training-free #Location-aware Argmax Inversion

2025년 9월 3일

[논문리뷰] FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

기존 메시 생성 방식이 토큰 시퀀스 내의 정점(vertex) 중복 사용으로 인해 발생하는 비효율성(과도한 토큰 길이, 느린 생성 프로세스)을 해결하고, 정점과 면(face)을 분리하여 처리 함으로써 고품질의 예술적 메시를 더욱 효율적이고 빠르게 생성 하는 것을 목표로 합니다.

#Review #3D Mesh Generation #Component Decoupling #Autoregressive Models #Bidirectional Transformer #Fidelity Enhancement #Prediction Filtering #Token Efficiency #Artistic Meshes

2025년 8월 27일

[논문리뷰] SpotEdit: Evaluating Visually-Guided Image Editing Methods

이 논문은 기존 벤치마크의 단순성과 실제 편집 과제에 대한 낮은 대표성이라는 한계를 극복하기 위해, 시각적으로 안내되는 이미지 편집(Visually-Guided Image Editing) 모델을 체계적이고 세밀하게 평가하기 위한 포괄적인 벤치마크인 SpotEdit 을 소개합니다.

#Review #Visually-Guided Image Editing #Multimodal Models #Benchmark #Hallucination #Diffusion Models #Autoregressive Models #Evaluation Metrics

2025년 8월 26일

[논문리뷰] Representing Speech Through Autoregressive Prediction of Cochlear Tokens

본 논문은 인간의 청각 처리 계층에서 영감을 받아, 유연하고 효율적으로 음성 정보를 이해하고 상호작용하는 인공 신경망 모델을 개발하는 것을 목표로 합니다.

#Review #Speech Representation Learning #Autoregressive Models #Cochlear Tokens #Biologically Inspired AI #Self-Supervised Learning #Audio Processing #Transformer Networks

2025년 8월 19일

[논문리뷰] NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

이 논문은 텍스트-이미지 생성 분야에서 기존 autoregressive (AR) 모델이 직면한 양자화 손실 및 무거운 확산 모델 의존성 의 한계를 극복하고자 합니다.

#Review #Autoregressive Models #Text-to-Image Generation #Continuous Latent Tokens #Flow Matching #Image Editing #Multimodal Learning #Transformer Architecture

2025년 8월 15일

[논문리뷰] VertexRegen: Mesh Generation with Continuous Level of Detail

기존 자동회귀 메쉬 생성 모델들이 부분-완료 방식으로 동작하여, 유효한 메쉬를 얻기 위해 전체 시퀀스를 생성해야만 하고 중간 단계에서는 불완전한 구조를 생성하는 문제를 해결하고자 합니다.

#Review #Mesh Generation #Level of Detail (LOD)#Progressive Meshes #Vertex Split #Autoregressive Models #Transformer #3D Graphics

2025년 8월 13일

[논문리뷰] Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation

본 논문은 이미지 이해, 텍스트-투-이미지 생성, 이미지 편집 기능을 단일 아키텍처 내에서 통합하는 1.5억 개 파라미터 의 자기회귀 모델 인 Skywork UniPic 을 소개합니다.

#Review #Autoregressive Models #Multimodal AI #Image Generation #Image Editing #Visual Understanding #Unified Architecture #Parameter Efficiency

2025년 8월 6일

[논문리뷰] FARMER: Flow AutoRegressive Transformer over Pixels

본 논문은 연속적인 autoregressive 모델링이 직면하는 긴 시퀀스 및 고차원 공간 문제를 해결하며, Normalizing Flows (NF) 와 Autoregressive (AR) 모델을 결합하여 픽셀 수준에서 정확한 우도 추정과 고품질 이미지 합성을 위한 단일화된 생성 프레임워크인 FARMER 를 제시합니다.

#Review #Normalizing Flows #Autoregressive Models #Generative Models #Image Synthesis #Tractable Likelihood #Dimension Reduction #Distillation #Classifier-Free Guidance

2025년 10월 28일

[논문리뷰] Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation

본 논문은 순차적인 토큰별 디코딩 과정으로 인해 수천 번의 모델 포워드 패스를 요구하는 자율회귀 텍스트-투-이미지 모델의 느린 추론 속도 문제를 해결하는 것을 목표로 합니다. 병렬 토큰 디코딩을 통해 자율회귀 텍스트-투-이미지 생성 모델의 추론을 가속화하고자 합니다.

#Review #Autoregressive Models #Text-to-Image Generation #Inference Acceleration #Jacobi Decoding #Denoising Diffusion Models #Speculative Decoding #Multi-token Prediction #Fine-tuning

2025년 10월 13일

[논문리뷰] Heptapod: Language Modeling on Visual Signals

이 논문은 시각 생성 모델에서 외부 의미론적 정보 주입 및 CFG(Classifier-Free Guidance)에 대한 의존성을 비판하며, 재구성 중심의 토크나이저 와 Transformer의 내재적 의미 학습 이라는 언어 모델링의 기본 원칙으로 회귀하는 것을 목표로 합니다.

#Review #Autoregressive Models #Image Generation #Language Modeling #Causal Transformer #2D Distribution Prediction #Visual Tokenization #Self-Supervised Learning #Generative Models

2025년 10월 9일

[논문리뷰] D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

본 논문은 시각적 자기회귀(AR) 모델 이 생성한 이미지의 탐지라는 새로운 도전 과제를 해결하는 것을 목표로 합니다. 기존 GAN이나 Diffusion 모델 탐지 방법론과 달리, AR 모델의 이산 토큰 예측 및 코드북 의 독특한 패턴과 빈도 분포 편향을 활용하여 실제 이미지와 생성된 이미지 간의 차이를 식별하고자 합니다.

#Review #Autoregressive Models #Image Detection #Discrete Distribution Discrepancy #Quantization Error #Transformer #Generative AI #Deepfake Detection

2025년 10월 9일

[논문리뷰] Fast-dLLM v2: Efficient Block-Diffusion LLM

본 논문은 Autoregressive (AR) 대규모 언어 모델(LLMs) 의 본질적인 순차적 디코딩으로 인한 추론 비효율성을 해결하는 것을 목표로 합니다.

#Review #Diffusion LLMs #Inference Acceleration #Parallel Decoding #Autoregressive Models #Caching #Fine-tuning #Block-wise Attention

2025년 10월 8일