#Flow Matching

116개의 포스트

[논문리뷰] Xiaomi-Robotics-1: Scaling Vision-Language-Action Models with over 100K Hours of Real-World Trajectories

본 논문은 대규모 데이터 학습을 통해 범용 로봇 정책을 구축하고자 하는 Scaling Laws의 가능성을 실현하는 데 초점을 맞춥니다.

#Review #Vision-Language-Action Models #Scaling Laws #Real-World Trajectories #Robot Foundation Models #Flow Matching #Auto-labeling #Cross-embodiment Learning

2026년 7월 19일

[논문리뷰] MeanFlowNFT: Bringing Forward-Process RL to Average-Velocity Generators

기존의 Diffusion 및 Flow 모델은 고품질 생성을 위해 많은 반복 연산이 필요하여 Latency 측면에서 비효율적이다.

#Review #MeanFlow #Reinforcement Learning #Forward-Process RL #Flow Matching #Few-step Generation #Average Velocity

2026년 7월 16일

[논문리뷰] GigaWorld-Policy-0.5: A Faster and Stronger WAM Empowered by AutoResearch

본 논문은 기존 WAM 방식이 추론 시 명시적인 미래 비디오 생성을 요구하여 발생하는 높은 연산 오버헤드와 실시간 제어의 한계를 해결하는 것을 목표로 합니다.

#Review #World Action Models #Robot Control #Mixture-of-Transformers #AutoResearch #Inference Latency #Flow Matching #Visual Dynamics

2026년 7월 15일

[논문리뷰] LATO.2: Factorized 3D Mesh Generation with Vertex and Topology Flow

기존의 3D 메시 생성 모델들은 정점의 공간적 위치와 위상적 연결성을 하나의 공유된 latent space에서 동시에 학습하려는 경향이 있어, 통계적으로 이질적인 두 신호를 효율적으로 처리하는 데 한계가 있다.

#Review #3D Mesh Generation #Flow Matching #Factorized Representation #Vertex Flow #Topology Flow #Latent Representation

2026년 7월 13일

[논문리뷰] Flow-ERD: Agent-type Aware Flow Matching with Entropy-Regularized Distillation for Diverse Traffic Simulation

본 논문은 자율주행 시뮬레이션에서 realism과 diversity라는 두 핵심 요소가 서로 상충되는 현상을 해결하고자 합니다 .

#Review #Multi-Agent Simulation #Flow Matching #Entropy-Regularized Distillation #Autonomous Driving #Traffic Simulation #Realism-Diversity Pareto

2026년 7월 12일

[논문리뷰] Enhancing In-context Panoramic Generation via Geometric-aware Pretraining

본 논문은 기존 파노라마 이미지 생성 모델이 겪는 3D 기하학적 일관성 부족 문제를 해결하기 위해 제안되었다.

#Review #Panoramic Generation #In-context Learning #Geometry-aware Pretraining #Flow Matching #Velocity Circular Padding #Canvas360Dataset

2026년 7월 9일

[논문리뷰] RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

기존의 로봇 조작을 위한 월드 모델들은 주로 2D 픽셀 기반의 비디오 생성에 의존하고 있어, 실제 로봇 시스템이 요구하는 정밀한 3D 공간 관계나 물리적 일관성을 확보하는 데 한계가 있습니다.

#Review #4D Embodied World Models #Robotic Manipulation #Generative Video Models #RGB-DF Representation #Flow Matching #Joint Cross-Modal Attention #Embodied AI

2026년 7월 7일

[논문리뷰] Perceptual Flow Matching for Few-Step Generative Modeling

본 논문은 기존 Flow Matching 모델이 고품질 출력을 생성하기 위해 수십 번의 샘플링 단계(35~50 steps)를 요구하여 발생하는 높은 추론 Latency 문제를 해결하고자 합니다.

#Review #Flow Matching #Few-Step Generation #Perceptual Supervision #Perceptual Feature Space #Generative Modeling #Classifier-free Guidance

2026년 7월 6일

[논문리뷰] InternVLA-A1.5: Unifying Understanding, Latent Foresight, and Action for Compositional Generalization

본 논문은 기존 VLA 모델들이 직면한 Semantics Erosion(사전 학습된 백본의 의미론적 지식 저하), Heterogeneous Objective Interference(서로 다른 학습 목적 간의 간섭), 그리고 픽셀 단위의 미래 예측에 따른 높은 비용 문제를 해결하고자 합니다.

#Review #Vision-Language-Action Models #Robot Manipulation #Latent Foresight #Compositional Generalization #Multimodal Co-training #Flow Matching

2026년 7월 6일

[논문리뷰] WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

본 논문은 기존 비디오 월드 모델이 시야를 벗어난 객체의 움직임과 정체성을 유지하지 못하는 문제를 해결하기 위해 WorldDirector를 제안합니다.

#Review #World Model #Video Generation #Dynamic Memory #Object Permanence #Controllable Simulation #Flow Matching #Spatial-Aware Control

2026년 7월 2일

[논문리뷰] Optimizing Visual Generative Models via Distribution-wise Rewards

본 논문은 시각적 생성 모델의 강화학습 과정에서 발생하는 Reward Hacking과 이로 인한 생성 다양성 저하 및 시각적 결함 문제를 해결하고자 합니다.

#Review #Distribution-wise Reward #Reinforcement Learning #Visual Generative Models #Subset-replace Strategy #Model Merging #Flow Matching #FID

2026년 7월 2일

[논문리뷰] Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

본 논문은 최신 diffusion 및 flow matching 모델의 연산 비용이 지나치게 높다는 점을 해결하고자 한다. 기존의 timestep distillation이나 feature caching 방식은 모델 학습이 필요하거나 가속 효율이 제한적이라는 한계가 존재한다.

#Review #Diffusion Models #Flow Matching #Training-Free Acceleration #Multi-Resolution Generation #Staged Sampling #Super-Resolution

2026년 7월 2일

[논문리뷰] PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation

본 논문은 기존 Autoregressive(AR) 메쉬 생성 모델이 직면한 심각한 추론 지연 및 오류 누적 문제를 해결하기 위해 PolyFlow를 제안한다. 기존 AR 방식은 메쉬를 고정된 시퀀스로 직렬화하여 순차적으로 토큰을 예측해야 하므로, 생성 속도가 매우 느리고 복잡한 형상에서 오류가 누적되기 쉽다.

#Review #Mesh Generation #Flow Matching #Topology Embedding #Retopology #Transformer #Parallel Generation #3D-Native

2026년 6월 30일

[논문리뷰] Qwen-Image-2.0-RL Technical Report

본 연구는 Qwen-Image-2.0 diffusion 모델이 가진 생성 품질과 지시 이행 능력 사이의 간극을 좁히고, 복잡한 편집 태스크에서 일관된 성능을 확보하기 위해 수행되었다.

#Review #RLHF #On-policy Distillation #Diffusion Models #Reward Modeling #Flow Matching #GRPO #Qwen-Image-Bench

2026년 6월 28일

[논문리뷰] DanceOPD: On-Policy Generative Field Distillation

본 연구는 단일 모델이 T2I, 로컬/글로벌 에디팅 등 서로 충돌할 수 있는 다양한 생성 능력을 통합하면서도 각각의 성능을 유지해야 하는 문제를 해결하고자 합니다. 기존의 데이터 혼합(data mixing)이나 모델 결합 방식은 capability 간의 gradient 충돌을 야기하거나 성능을 희석시키는 한계를 가집니다.

#Review #Generative Field Distillation #Flow Matching #On-Policy Distillation #Capability Composition #Hard-Routed Field Matching #Multi-Capability Alignment

2026년 6월 25일

[논문리뷰] World Value Models for Robotic Manipulation

본 연구는 기존 로봇 가치 모델이 정적인 이미지 기반의 VLM 백본에 의존하여 장기적인 시간적 맥락과 미래 결과를 이해하는 데 한계가 있다는 문제점에서 출발합니다.

#Review #World Models #Robotic Manipulation #Value Estimation #Flow Matching #Distributional Value #Suboptimal-Value-Bench

2026년 6월 23일

[논문리뷰] ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

기존의 WAM은 미래 비디오 생성에 의존하여 로봇 행동을 추론하지만, 여기에는 세 가지 심각한 한계가 존재합니다. 우선, 다수의 프레임에 대한 시공간 토큰을 처리해야 하므로 Inference 비용이 극도로 높습니다.

#Review #World Action Models #Image Editing #Robot Manipulation #Flow Matching #Efficient Inference #Embodied AI

2026년 6월 18일

[논문리뷰] FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

본 논문은 기존 conditional generative 모델들이 컨디셔닝 신호를 정적인 입력으로만 취급하여 발생하는 생성 품질 저하와 정렬 실패 문제를 해결하고자 합니다.

#Review #Flow Matching #Conditional Generation #Feedback-Aware Training #Closed-Loop Inference #Self-Correction

2026년 6월 18일

[논문리뷰] PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

기존의 World Foundation Models는 대부분 단일 뷰(Single-view) 기반으로 동작하거나, 다중 뷰를 단순히 시퀀스 차원에서 연결(Concatenation)하는 방식을 취하여 로봇 조작에 필수적인 3D 일관성 문제를 해결하지 못한다.

#Review #World Foundation Model #Robotic Manipulation #3D Consistency #Diffusion Transformer #Flow Matching #Multi-view Generation

2026년 6월 17일

[논문리뷰] BadWorld: Adversarial Attacks on World Models

본 논문은 VWMs의 잠재적 취약성을 평가하기 위한 최초의 적대적 공격 프레임워크인 BadWorld를 제안합니다.

#Review #Adversarial Attack #Visual World Models #Autoregressive Generation #Flow Matching #Trajectory-Adaptive Optimization #Label-Free

2026년 6월 15일

[논문리뷰] World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

본 논문은 기존의 단일 이미지 3D 추정 방식이 가진 '충실도(Faithfulness)'와 '완전성(Completeness)' 사이의 상충 문제를 해결하고자 합니다.

#Review #World Tracing #Pixel-Aligned #Geometry Generation #Diffusion Transformer #Flow Matching #Multilayer #3D Vision

2026년 6월 14일

[논문리뷰] WaveDiT: Distribution-Aware Wavelet Flow Matching for Efficient 3D Brain MRI Synthesis

본 논문은 3D MRI 합성 시 발생하는 높은 계산 비용과 해부학적 상세 정보 손실 문제를 해결하기 위해 WaveDiT를 제안합니다.

#Review #3D MRI Synthesis #Flow Matching #Discrete Wavelet Transform #Heteroscedastic Uncertainty #Generative Models #Brain Age Prediction

2026년 6월 14일

[논문리뷰] Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

본 연구는 파편화된 로봇 학습 시스템의 한계를 극복하고, 데이터 수집부터 실제 현장 배치(Deployment)까지를 아우르는 통합된 엔드투엔드 VLA 학습 스택을 구축하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied AI #Flow Matching #Robot Learning Stack #Proximalized Preference Optimization #UMI

2026년 6월 14일

[논문리뷰] PianoKontext: Expressive Performance Rendering from Deadpan Context

본 논문은 기존의 음악 생성 모델이 표현적 타이밍(Expressive timing)과 다성 음악(Polyphonic music)의 복잡성을 제대로 모델링하지 못하는 문제를 해결하기 위해 PianoKontext를 제안한다.

#Review #Expressive Performance Rendering #Flow Matching #Latent Diffusion #Dynamic Time Warping #Music2Latent #DiT #RoPE

2026년 6월 11일

[논문리뷰] Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

본 논문은 기존의 강화학습 미세 조정 기법이 Flow Matching 모델의 고유한 확률적 역학을 충분히 고려하지 못하여 발생하는 성능 불안정성 문제를 해결합니다.

#Review #Flow Matching #RLHF #Proximal Policy Optimization #Divergence Constraint #Policy Optimization

2026년 6월 9일

[논문리뷰] OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

본 논문은 휴머노이드 로봇의 Loco-Manipulation 태스크를 위한 대규모의 고품질 Demonstration Data 부족 문제를 해결하고자 합니다.

#Review #Humanoid Loco-Manipulation #Simulation Data Collection #Zero-Shot Transfer #Domain Randomization #Visuomotor Policy #Flow Matching #Unitree G1

2026년 6월 8일

[논문리뷰] Flash-WAM: Modality-Aware Distillation for World Action Models

본 논문은 WAM이 manipulation 벤치마크에서 강력한 성능을 보임에도 불구하고, 실시간 제어를 저해하는 높은 inference latency 문제를 해결하고자 합니다. 기존 WAM은 video 및 action denoising에 수십 단계의 반복적인 과정을 거쳐야 하므로 실시간 로봇 제어에 부적합합니다.

#Review #World-Action Models #Step Distillation #Consistency Models #Robotic Foundation Models #Flow Matching #Modality-Aware Distillation

2026년 6월 4일

[논문리뷰] Qwen-Image-Flash: Beyond Objective Design

본 논문은 기존의 few-step distillation 연구가 주로 증류 목적 함수(Distillation Objective) 설계에만 치중하여 실제 훈련 레시피(Training Recipe)가 미치는 영향력을 간과했다는 점을 지적한다.

#Review #Few-step Distillation #Flow Matching #DMD #T2I Generation #Image Editing #Training Recipe #Multi-teacher Guidance

2026년 6월 3일

[논문리뷰] Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

본 논문은 대규모 paired dataset 없이도 instruction-based visual editing이 가능한 범용 프레임워크인 ByG (Bootstrap Your Generator)를 제안합니다 .

#Review #Flow Matching #Unpaired Editing #Cycle Consistency #Straight-Through Estimation #Gradient Routing #Bootstrap #Visual Editing

2026년 6월 2일

[논문리뷰] EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers

본 논문은 기존의 Diffusion 기반 3D 생성 모델들이 의미론적 이해(semantic understanding)와 기하학적 추론(geometric reasoning)을 분리하여 처리함으로써 발생하는 한계를 해결하고자 합니다.

#Review #Multimodal Large Language Models #Mixture-of-Transformers #3D Native Generation #Context-aware Editing #Flow Matching #Sparse Voxel Representation

2026년 6월 1일

[논문리뷰] UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

본 논문은 LLM의 행동 제어를 위한 기존 Activation Steering 방법론들이 가진 확장성 및 구성적 제약 문제를 해결하기 위해 UniSteer를 제안합니다.

#Review #LLM Steering #Activation Space #Flow Matching #Text-Guided Control #Activation Inversion #Multi-Constraint #Zero-shot Classification

2026년 5월 28일

[논문리뷰] SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

본 논문은 비디오 생성 모델이 Sparse한 조건(Text, Start/End Frame)에만 의존함에 따라 발생하는 서사 구조 및 시간적 페이싱(Temporal Pacing) 제어의 한계를 극복하고자 SmartDirector를 제안합니다.

#Review #Video Generation #Keyframe-Conditioned #Narrative Pacing #Flow Matching #Multi-Chunk VAE #Director-Gen #Director-SR

2026년 5월 28일

[논문리뷰] Geometry-Aware Image Flow Matching

기존의 Continuous Normalizing Flows (CNF), Diffusion models (DM), Flow Matching (FM)과 같은 발전된 생성 모델들은 이미지 데이터를 고차원 Euclidean space의 벡터로 간주하는 Euclidean geometry 가정을 기반으로 합니다.

#Review #Flow Matching #Spherical Geometry #Image Generation #Riemannian Manifold #Optimal Transport #Hyperspherical Projection #Generative Models

2026년 5월 25일

[논문리뷰] FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

본 논문은 비디오 Diffusion 모델의 생성 범위를 학습된 문맥 길이 이상으로 확장하는 과정에서 발생하는 품질 저하와 시간적 일관성 문제를 해결하고자 합니다.

#Review #Long Video Generation #Flow Matching #Tweedie Matching #Stochastic Early-Phase Sampling #Inference-time Framework #Diffusion Models

2026년 5월 21일

[논문리뷰] Lance: Unified Multimodal Modeling by Multi-Task Synergy

본 논문은 기존 멀티모달 모델들이 이해와 생성이라는 두 가지 이질적인 목적을 통합할 때 발생하는 성능 저하와 작업 범위의 한계를 해결하기 위해 제안되었습니다.

#Review #Unified Multimodal Modeling #Multi-Task Synergy #Dual-Stream Architecture #Modality-Aware Rotary Positional Encoding #Autoregressive Modeling #Flow Matching

2026년 5월 18일

[논문리뷰] KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

기존의 비디오 생성 모델 정렬 기법들은 주로 노이즈 기반의 탐색(exploration)이나 SDE 기반의 surrogate policy를 사용하여, 결정론적(deterministic) ODEdynamics로 작동하는 distilled AR 모델의 특성과 상충하는 문제를 야기합니다 .

#Review #Autoregressive Video Generation #Reinforcement Learning #Policy Optimization #Flow Matching #KV Caching #Causal-Semantic Exploration #Trajectory Velocity Energy

2026년 5월 18일

[논문리뷰] PRISM: Prior Rectification and Uncertainty-Aware Structure Modeling for Diffusion-Based Text Image Super-Resolution

본 논문은 심각하게 훼손된 텍스트 이미지에서 기존의 Text-SR 방법론들이 보이는 한계점을 지적하며 연구를 시작합니다. 기존 연구들은 강력한 생성적 Prior를 사용하려 시도하지만, 심각하게 열악한 입력 환경에서는 이 Prior가 신뢰할 수 없는 노이즈가 되어 인식 오류를 발생시킵니다.

#Review #Text Image Super-Resolution #Diffusion Model #Flow Matching #Uncertainty-Aware #Prior Rectification #Structure Refinement

2026년 5월 14일

[논문리뷰] DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

본 논문은 기존의 멀티태스크 강화학습(RL) 방식이 겪는 최적화 간섭(Optimization Interference)과 성능 불균형 문제를 해결하기 위해 고안되었습니다.

#Review #Diffusion Models #On-Policy Distillation #Multi-Task Reinforcement Learning #Flow Matching #Preference Alignment

2026년 5월 14일

[논문리뷰] Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

본 논문은 최신 UMM이 이해와 생성 기능을 한 모델 내에 통합했음에도 불구하고, 실제로는 두 구성 요소가 상호작용 없이 분리된(Decoupled) 구조로 설계되어 성능 극대화에 한계가 있다는 문제를 지적합니다.

#Review #Unified Multimodal Models #Understanding-Oriented Post-Training #Generation Synergy #Flow Matching #Semantic Supervision #MetaQuery

2026년 5월 10일

[논문리뷰] Normalizing Trajectory Models

본 논문은 기존 diffusion 및 flow matching 모델이 few-step generation 과정에서 겪는 가우시안(Gaussian) 근사의 한계를 해결하고자 합니다.

#Review #Normalizing Trajectory Models #Flow Matching #Normalizing Flows #Few-step Generation #Exact Likelihood #Stochastic Trajectory

2026년 5월 10일

[논문리뷰] Flow-OPD: On-Policy Distillation for Flow Matching Models

본 논문은 Flow Matching 모델의 다중 작업 정렬(multi-task alignment) 과정에서 발생하는 보상 희소성(reward sparsity)과 기울기 간섭(gradient interference) 문제를 해결하고자 합니다.

#Review #Flow Matching #On-Policy Distillation #Reinforcement Learning #Multi-task Alignment #Manifold Anchor Regularization #Text-to-Image

2026년 5월 10일

[논문리뷰] Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

본 논문은 기존의 Diffusion Distillation 방식이 학습 및 추론 시 고정된 이산적 타임스텝(discrete anchors)에 지나치게 의존함으로써 발생하는 성능 저하 문제를 해결하고자 한다.

#Review #Diffusion Models #Distillation #Continuous-Time Optimization #Distribution Matching #Few-Step Generation #Flow Matching

2026년 5월 7일

[논문리뷰] MolmoAct2: Action Reasoning Models for Real-world Deployment

본 논문은 범용 로봇 제어(Generalist robot manipulation)를 위한 VLA 모델이 실질적인 실환경 배포(Real-world deployment) 요건을 충족하지 못하는 한계를 해결하고자 한다.

#Review #Vision-Language-Action (VLA) Model #Embodied Reasoning #Flow Matching #Adaptive Depth Perception #Open-source Robotics #Real-world Deployment

2026년 5월 4일

[논문리뷰] Generative Modeling with Orbit-Space Particle Flow Matching

본 논문은 현대의 Grid-based 생성 모델(Diffusion, Flow Matching)이 입자 시스템(Particle Systems)의 고유한 특성을 효과적으로 다루지 못한다는 점을 해결하고자 합니다 .

#Review #Generative Modeling #Flow Matching #Particle Systems #Orbit-Space Canonicalization #Geometric Probability Paths #Surface Normals #Arc-Length Terminal Velocity

2026년 5월 4일

[논문리뷰] Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

본 연구는 고전적인 데이터 분석 모델인 결정 트리와 현대의 생성 모델인 diffusion model이 각각 수행하는 계층적 정보 정제 과정 사이의 근본적인 수학적 연결고리를 규명하고자 합니다.

#Review #Decision Trees #Diffusion Models #Global Trajectory Score Matching (GTSM)#Probability Flow ODE #Tabular Data #Knowledge Distillation #Flow Matching

2026년 5월 3일

[논문리뷰] ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

본 논문은 기존의 고품질 인물 비디오 생성 연구들이 포즈, 외형, 카메라 뷰포인트에 대한 동시 제어 역량이 부족하고, 대규모 고품질 멀티뷰 비디오 데이터의 희소성으로 인해 성능 한계에 직면했다는 문제를 해결하고자 한다.

#Review #Human Video Generation #Image-First Synthesis #Flow Matching #Temporal Consistency #SMPL-X #Diffusion Transformer

2026년 4월 22일

[논문리뷰] Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

본 논문은 로봇 제어 루프에 월드 모델을 통합하여 미래 예측과 평가 과정을 추가하는 Cortex 2.0 아키텍처를 제안합니다 . 제안 모델은 현재 관측치를 바탕으로 월드 모델을 통해 $k$개의 미래 궤적 후보를 잠재 공간(Visual latent space)에서 생성합니다.

#Review #Vision-Language-Action Models #World Models #Robotic Manipulation #Plan-and-Act #Process-Reward Operator #Flow Matching #Cross-Embodiment

2026년 4월 22일

[논문리뷰] HP-Edit: A Human-Preference Post-Training Framework for Image Editing

본 논문은 기존의 이미지 편집 모델이 SFT(Supervised Fine-Tuning) 데이터의 품질 불일치와 실제 인간 선호도와 동떨어진 결과물을 생성하는 문제를 해결하고자 한다.

#Review #Image Editing #Human-Preference Alignment #Reinforcement Learning #Flow Matching #Visual Large Language Model

2026년 4월 21일

[논문리뷰] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

본 논문은 Flow Matching 모델을 인간의 선호도에 맞게 정렬(alignment)하는 과정에서 기존 Direct-Gradient 방식들이 가진 고비용 메모리 문제와 그래디언트 폭주(gradient explosion) 문제를 해결하고자 합니다.

#Review #Flow Matching #Preference Alignment #Direct-Gradient Method #Leap Trajectory #Trajectory-Similarity Weighting #Gradient Discounting

2026년 4월 16일

[논문리뷰] RewardFlow: Generate Images by Optimizing What You Reward

본 연구는 기존의 diffusion 기반 이미지 편집 모델들이 요구하는 고비용의 fine-tuning 또는 불안정한 inversion 과정을 극복하고, zero-shot 설정에서 보다 정교하고 일관된 편집을 수행하는 것을 목적으로 합니다.

#Review #Diffusion Models #Flow Matching #Langevin Dynamics #Image Editing #Zero-shot Generation #Multi-reward Guidance #Adaptive Policy

2026년 4월 9일

[논문리뷰] Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics

본 연구에서는 Phantom이라 명명된 물리 주입형(Physics-Infused) 비디오 생성 프레임워크를 제안합니다. Phantom은 사전 학습된 비디오 확산 모델인 Wan2.2-TI2V를 기반으로 하며, 이에 병렬적인 물리 동역학 브랜치를 추가하여 latent 공간에서 물리적 상태를 예측합니다.

#Review #Video Generation #Physics-Infused #Flow Matching #Latent Dynamics #V-JEPA2 #Dual-Branch Architecture

2026년 4월 9일

[논문리뷰] FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching

본 논문은 기존 multimodal generation이 언어 모델 중심의 파이프라인에 의존하여 vision의 자체적인 추론 및 생성 능력이 제한되는 문제를 해결하고자 한다.

#Review #Multimodal Generation #Flow Matching #Visual Prompts #Image-in Image-out #Visual Instruction Following #VisPrompt-5M #VP-Bench

2026년 4월 8일

[논문리뷰] Woosh: A Sound Effects Foundation Model

본 논문은 사운드 이펙트 생성에 특화된 고품질 오픈 소스 파운데이션 모델의 부재를 해결하기 위해 Woosh 를 제안한다. 기존의 오픈 모델들은 저해상도 오디오(16kHz 제한)만을 지원하거나, 음악 생성에 치우쳐 있어 프로페셔널 사운드 이펙트 제작에 한계가 있다.

#Review #Foundation Model #Sound Effects #Latent Diffusion Model #Flow Matching #Audio-Visual Generation #Distillation

2026년 4월 2일

[논문리뷰] Unified Number-Free Text-to-Motion Generation Via Flow Matching

기존의 Text-to-motion 모델들은 주로 단일 에이전트 생성에 국한되어 있으며, 다중 에이전트 생성의 경우 고정된 수의 에이전트만 처리할 수 있다는 한계가 있습니다.

#Review #Text-to-Motion #Flow Matching #Number-Free Synthesis #Hierarchical Modeling #Multi-Person Interaction

2026년 3월 30일

[논문리뷰] UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

생성형 AI는 Interleaved Generation이 가능한 unified multimodal model로 빠르게 발전하고 있으며, 이는 반복적인 추론(iterative reasoning)을 통해 복잡한 이미지 합성(image synthesis) 작업을 해결할 잠재력을 제공합니다.

#Review #Unified Policy Optimization #Reinforcement Learning #Reasoning-Driven Generation #Interleaved Generation #Flow Matching #Markov Decision Process #Classifier-Free Guidance #Reward Hacking

2026년 3월 24일

[논문리뷰] TrajLoom: Dense Future Trajectory Generation from Video

Future motion prediction은 video understanding과 controllable video generation에 있어 매우 중요합니다.

#Review #Dense Trajectory Generation #Future Motion Prediction #Video Understanding #Flow Matching #Variational Autoencoder #Spatiotemporal Consistency #On-policy Fine-tuning #Grid-Anchor Offset Encoding

2026년 3월 24일

[논문리뷰] FASTER: Rethinking Real-Time Flow VLAs

Vision-Language-Action (VLA) 모델의 실제 로봇 배포에서 실시간 실행(real-time execution)은 매우 중요합니다.

#Review #Vision-Language-Action (VLA) Models #Real-Time Robotics #Action Chunking #Reaction Latency #Flow Matching #Horizon-Aware Schedule (HAS)#Time to First Action (TTFA)

2026년 3월 19일

[논문리뷰] WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

최근 Flow Matching 모델은 Latent Autoencoder의 재구성 병목 현상을 피하기 위해 픽셀 공간에서 직접 작동합니다. 그러나 픽셀 매니폴드(manifold)의 의미론적 연속성이 부족하여 최적 운송 경로가 심하게 얽히게 됩니다.

#Review #Image Generation #Flow Matching #Trajectory Conflict #Diffusion Transformers #Waypoint Diffusion Transformers #Just-Pixel AdaLN

2026년 3월 17일

[논문리뷰] Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers

Diffusion Transformers(DiT)의 높은 계산 비용, 특히 공간적 중복성으로 인한 실용적 배포의 어려움을 해결하는 것이 주요 목표입니다.

#Review #Diffusion Transformers #Spatial Acceleration #Training-Free #Generative AI #Flow Matching #ODE Solvers #Inference Speedup #Resource Allocation

2026년 3월 11일

[논문리뷰] Streaming Autoregressive Video Generation via Diagonal Distillation

대규모 확산 모델의 제한된 실시간 스트리밍 기능을 개선하고, 기존 자기회귀 모델의 높은 연산 비용으로 인한 낮은 품질 문제를 해결하는 것이 목표입니다.

#Review #Video Generation #Autoregressive Models #Diffusion Models #Distillation #Real-time #Streaming #Temporal Coherence #Flow Matching

2026년 3월 10일

[논문리뷰] Distribution-Conditioned Transport

본 논문은 기계 학습에서 흔히 발생하는, 훈련 중 관찰되지 않은 소스 및 타겟 분포로 전이 모델을 일반화 하는 문제를 해결하는 것을 목표로 합니다.

#Review #Distribution-Conditioned Transport #Generative Distribution Embeddings #Optimal Transport #Flow Matching #Semi-Supervised Learning #Generalization #Single-cell Genomics #Batch Effect Transfer

2026년 3월 5일

[논문리뷰] CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

기존 Classifier-Free Guidance (CFG)가 선형 제어에 의존하여 높은 가이던스 스케일에서 발생하는 불안정성, 오버슈팅, 의미 충실도 저하 문제를 해결하는 것입니다.

#Review #Diffusion Models #Classifier-Free Guidance #Control Theory #Sliding Mode Control #Text-to-Image Generation #Flow Matching #Generative AI #Robustness

2026년 3월 3일

[논문리뷰] Mode Seeking meets Mean Seeking for Fast Long Video Generation

본 논문은 몇 초 길이의 단편 비디오 생성에서 분 단위 길이의 장편 비디오 생성으로 확장할 때 발생하는 주요 병목 현상을 해결하고자 합니다.

#Review #Long Video Generation #Diffusion Models #Mode Seeking #Mean Seeking #Decoupled Diffusion Transformer #Flow Matching #Distribution Matching #Video Synthesis

2026년 3월 1일

[논문리뷰] Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models

본 연구는 짧은 비디오 데이터로 학습한 모델이 추론 시 긴 길이의 오디오(Long-Form Audio)를 일관성 있고 고품질로 생성할 수 있도록 Video-to-Audio (V2A) 모델의 길이 일반화(Length Generalization) 문제 를 해결하는 것을 목표로 합니다.

#Review #Video-to-Audio Generation #Length Generalization #Multimodal Learning #Mamba Architecture #Hierarchical Networks #Flow Matching #Audio Synthesis

2026년 2월 26일

[논문리뷰] Communication-Inspired Tokenization for Structured Image Representations

본 논문은 기존 이미지 토크나이저들이 재구성 및 압축에만 초점을 맞춰 객체 수준의 의미론적 구조보다는 국부적인 텍스처를 포착하는 한계를 해결하고자 합니다.

#Review #Image Tokenization #Structured Representation #Attentive Encoding #Flow Matching #Semantic Alignment #Compositional Generalization #Transformer Architecture

2026년 2월 24일

[논문리뷰] SimVLA: A Simple VLA Baseline for Robotic Manipulation

본 논문은 급변하는 VLA 연구 분야에서 성능 향상의 정확한 원인을 파악하기 어려운 문제를 해결하기 위해, 간소화된 VLA 베이스라인 SimVLA 를 제안합니다.

#Review #Robotic Manipulation #Vision-Language-Action (VLA) Models #Baseline Model #Modular Design #Flow Matching #Zero-Shot Generalization #Standardized Training #Efficiency

2026년 2월 23일

[논문리뷰] SARAH: Spatially Aware Real-time Agentic Humans

본 논문은 VR, 텔레프레젠스, 디지털 휴먼 애플리케이션에서 사용자의 움직임과 대화에 동적으로 반응하며, 자연스러운 시선을 유지하는 공간 인식(spatially aware) 에이전트의 실시간 전신 3D 동작 생성 을 목표로 합니다.

#Review #Embodied Agents #Real-time #Conversational AI #Motion Generation #Spatially Aware #VR #Causal Models #Flow Matching #Gaze Control

2026년 2월 22일

[논문리뷰] World Action Models are Zero-shot Policies

본 논문은 Vision-Language-Action (VLA) 모델의 한계인 새로운 환경에서 미지의 물리적 동작에 대한 일반화 능력 부족을 해결하고자 합니다.

#Review #World Action Models #Video Diffusion Models #Zero-shot Generalization #Cross-embodiment Transfer #Real-time Control #Robotics #Foundation Models #Flow Matching

2026년 2월 18일

[논문리뷰] Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

본 논문은 대규모 VLA 모델의 높은 추론 지연 시간으로 인한 실시간 로봇 제어의 어려움과, 사전 학습된 VLM의 시각-의미론적 지식 손실(catastrophic forgetting) 문제를 해결하는 것을 목표로 합니다.

#Review #Vision-Language-Action (VLA)#Real-Time Robotics #Diffusion Transformer #Flow Matching #Asynchronous Execution #Robot Manipulation #Pre-training #Catastrophic Forgetting

2026년 2월 15일

[논문리뷰] FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching

본 논문은 Diffusion Models 및 Flow Matching 과 같은 반복적인 생성 정책(iterative generative policies)을 Maximum Entropy Reinforcement Learning (Max-Ent RL) 과 결합할 때 발생하는 문제를 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Maximum Entropy RL #Kinetic Energy Regularization #Schrödinger Bridge #Generative Policies #Flow Matching #Actor-Critic

2026년 2월 15일

[논문리뷰] Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

본 논문은 텍스트-투-이미지 생성에 Flow Matching 모델과 Group Relative Policy Optimization (GRPO)을 적용할 때 발생하는 희소한 보상(sparse rewards) 문제를 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Flow Matching #Text-to-Image Generation #Sparse Rewards #Credit Assignment #Turning Points #Group Relative Policy Optimization

2026년 2월 9일

[논문리뷰] Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

본 논문은 적은 추론 단계(few-step inference)로 고품질 이미지를 빠르게 생성하기 위한 Distribution Matching Distillation (DMD) 과정에서 발생하는 모드 붕괴(mode collapse) 문제를 해결하는 것을 목표로 합니다.

#Review #Diffusion Models #Model Distillation #Mode Collapse #Image Generation #Diversity Preservation #Flow Matching #Few-Step Synthesis

2026년 2월 3일

[논문리뷰] Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

본 논문은 로봇 학습의 고질적인 문제인 데이터의 이질성, 낮은 품질, 그리고 행동 모방 (Behavior Cloning, BC)의 장기 태스크 한계를 해결하고자 합니다.

#Review #Vision-Language-Action #Generalist Robots #Staged Training #Reinforcement Learning #Multi-embodiment #Data Quality #Humanoid Robotics #Flow Matching

2026년 2월 2일

[논문리뷰] DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

본 연구는 사전 훈련된 Vision Foundation Model (VFM) 기반의 생성형 오토인코더가 겪는 낮은 재구성 충실도(fidelity) 문제를 해결하고, 동시에 효율적인 이미지 생성 능력을 유지하는 것을 목표로 합니다.

#Review #Autoencoder #DINO #Vision Foundation Models #Image Generation #Image Reconstruction #Spherical Manifold #Diffusion Models #Flow Matching

2026년 2월 1일

[논문리뷰] Transition Matching Distillation for Fast Video Generation

대규모 비디오 Diffusion 모델이 고품질 비디오를 생성하지만, 다단계 샘플링 과정의 비효율성으로 인해 실시간 상호작용 애플리케이션에 적용하기 어렵다는 문제를 해결하고자 합니다.

#Review #Video Generation #Diffusion Models #Model Distillation #Few-Step Sampling #Transition Matching #Flow Matching #DMD2 #Efficiency

2026년 1월 15일

[논문리뷰] SAM Audio: Segment Anything in Audio

본 논문은 기존의 도메인 특화되거나 단일 모달 프롬프트에 한정된 오디오 분리 모델의 한계를 극복하고자 합니다. 텍스트, 시각, 시간 스팬 프롬프팅을 단일 프레임워크 내에서 통합하여 일반 오디오 분리를 위한 범용 파운데이션 모델 을 개발하는 것을 목표로 합니다.

#Review #Audio Source Separation #Foundation Models #Multimodal Prompting #Diffusion Transformers #Flow Matching #Self-Supervised Learning #Reference-Free Evaluation #Audio-Visual Learning

2025년 12월 23일

[논문리뷰] Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

본 논문은 2025 BEHAVIOR Challenge에서 1위를 차지한 비전-액션 정책을 제시하며, 50가지의 다양하고 장기적인 가정용 작업을 포토리얼리스틱 시뮬레이션 에서 수행하는 것을 목표로 합니다.

#Review #Vision-Language-Action (VLA) models #Flow Matching #Embodied AI #Robot Manipulation #BEHAVIOR Challenge #Correlated Noise #Stage Tracking #Multi-Task Learning

2025년 12월 14일

[논문리뷰] SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

본 논문은 대규모 텍스트-이미지 생성 확산 모델을 Visual Foundation Model (VFM) 표현 공간 내에서 완전히 훈련하는 것이 기존 VAE 기반 모델에 필적하는 성능을 낼 수 있는지 탐구합니다.

#Review #Text-to-Image Generation #Latent Diffusion Model #Visual Foundation Model #DINOv3 #Flow Matching #High-Resolution Synthesis #VAE-free Generation

2025년 12월 14일

[논문리뷰] TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

현재 다단계 생성 모델(Diffusion, Flow Matching)의 느린 추론 속도 (40-100 NFE) 문제를 해결하는 것을 목표로 합니다.

#Review #Generative Models #One-step Generation #Self-Adversarial Learning #Flow Matching #Large Language Models #Text-to-Image #Efficient Inference #Diffusion Models

2025년 12월 7일

[논문리뷰] TV2TV: A Unified Framework for Interleaved Language and Video Generation

본 논문은 복잡한 시맨틱 추론이나 반복적인 고수준 계획이 필요한 비디오 생성에서 기존 모델들이 겪는 한계를 극복하고자 합니다. 비디오 생성을 텍스트와 비디오 생성의 교차 프로세스로 분해함으로써 시각적 품질과 사용자 제어 가능성을 획기적으로 향상시키는 것을 목표로 합니다.

#Review #Video Generation #Language Modeling #Multimodal AI #Interleaved Generation #Flow Matching #Transformer #Controllability #World Models

2025년 12월 4일

[논문리뷰] Generative Neural Video Compression via Video Diffusion Prior

본 논문은 기존 비디오 압축 방식이 초저비트레이트 환경에서 발생하는 흐릿함, 세부 정보 손실, 그리고 지각적 깜빡임(perceptual flickering) 문제를 해결하는 것을 목표로 합니다.

#Review #Neural Video Compression #Diffusion Models #Generative Models #Video Compression #Temporal Coherence #Perceptual Quality #Flow Matching #Video Diffusion Transformer (VideoDiT)

2025년 12월 4일

[논문리뷰] DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models

Vision-Language-Action (VLA) 모델이 분포 변화 및 복잡한 다단계 로봇 조작 태스크에서 성능 저하를 겪는 문제를 해결하고자 합니다. 이는 학습된 표현이 태스크 관련 의미를 견고하게 포착하지 못하기 때문이며, 본 논문은 기하학적 정규화 를 통해 VLA 모델의 견고성을 향상시키는 것을 목표로 합니다.

#Review #VLA Models #Flow Matching #Robotics #Robustness #Distribution Shift #Wasserstein Distance #Geometric Regularization #Representation Learning

2025년 12월 2일

[논문리뷰] TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

논문은 멀티모달 이해와 생성 태스크를 단일 프레임워크 내에서 원활하게 수행하는 TUNA라는 네이티브 통합 멀티모달 모델(UMM) 을 개발하는 것을 목표로 합니다. 기존 UMM의 분리된 또는 편향된 시각 표현 방식 으로 인한 한계를 극복하고, 이해와 생성 모두에 효과적인 통합된 연속 시각 표현 공간 을 구축하고자 합니다.

#Review #Unified Multimodal Models #Visual Representation #VAE #Flow Matching #Multimodal Understanding #Multimodal Generation #Image Editing #State-of-the-Art

2025년 12월 1일

[논문리뷰] Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

본 논문은 3D Diffusion 모델의 느린 추론 속도 문제를 해결하는 것을 목표로 합니다.

#Review #3D Geometry Synthesis #Diffusion Models #Acceleration #Caching #Training-free #Flow Matching #Voxel Stabilization #Computational Efficiency

2025년 11월 30일

[논문리뷰] Adversarial Flow Models

본 논문은 기존 GANs (Generative Adversarial Networks) 의 훈련 불안정성과 Flow Matching 모델의 저해상도 이산화 오류 및 반복적인 추론 비용 문제를 해결하고자 합니다.

#Review #Generative Models #Adversarial Flow Models #GANs #Flow Matching #Optimal Transport #Single-step Generation #Image Generation #Transformer Architecture

2025년 11월 30일

[논문리뷰] Terminal Velocity Matching

논문은 고품질 샘플을 빠르고 효율적으로 생성하며, 고차원 데이터에 확장 가능한 생성 모델을 단일 훈련 단계로 구축하는 것을 목표로 합니다.

#Review #Generative Models #Flow Matching #Diffusion Models #One-Step Generation #Few-Step Generation #Wasserstein Distance #Transformer Architecture #Lipschitz Continuity

2025년 11월 26일

[논문리뷰] DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

기존 픽셀 확산 모델이 Diffusion Transformer (DiT) 하나로 고주파수 신호와 저주파수 의미론을 동시에 모델링하여 발생하는 느린 학습 및 추론 속도, 낮은 이미지 품질 문제를 해결하고자 합니다.

#Review #Pixel Diffusion #Image Generation #Frequency Decoupling #Diffusion Transformer (DiT)#Flow Matching #AdaLN #Text-to-Image Synthesis

2025년 11월 24일

[논문리뷰] Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

본 논문은 고품질의 일관되고 제어 가능한 이미지 및 비디오 생성을 위한 AI/ML 분야의 핵심 과제를 해결하고자 합니다. 특히, 최신 이미지 및 10초 비디오 합성을 위한 Kandinsky 5.0 이라는 최첨단 파운데이션 모델 제품군을 개발하여 최고 수준의 품질과 운영 효율성을 달성하는 것을 목표로 합니다.

#Review #Image Generation #Video Generation #Diffusion Models #Flow Matching #Diffusion Transformer #NABLA #RLHF #Supervised Fine-tuning

2025년 11월 19일

[논문리뷰] EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

본 연구는 기존 가상 착용(virtual try-on) 모델들이 agnostic person images , human pose , densepose 등 복잡한 입력에 의존하고 레퍼런스 이미지 지원이 부족하여 현실성이 떨어지는 문제를 해결하고자 합니다.

#Review #Virtual Try-on #Diffusion Models #End-to-End Learning #Reference Images #Unpaired Data #Flow Matching #Transformer Architecture #Generative AI

2025년 11월 9일

[논문리뷰] World Simulation with Video Foundation Models for Physical AI

본 논문은 물리 AI(Physical AI) 시스템의 훈련 시 발생하는 높은 비용과 위험성을 해결하기 위해 고품질의 가상 세계 시뮬레이터를 제공하는 것을 목표로 합니다.

#Review #Physical AI #World Simulation #Video Foundation Models #Flow Matching #Reinforcement Learning #Robotics #Autonomous Driving #Synthetic Data Generation

2025년 11월 9일

[논문리뷰] UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

기존 확산 모델 기반 relighting 기법의 물리적 비일관성 문제(예: 과노출 하이라이트, 그림자 부정확성)를 해결하고, 물리적으로 그럴듯하며 세밀하게 제어 가능한 이미지 및 비디오 relighting을 위한 통합 프레임워크(UniLumos) 를 개발하는 것을 목표로 합니다.

#Review #Relighting #Diffusion Models #Flow Matching #Physics-Plausible Feedback #Image-to-Video #Geometric Supervision #Path Consistency Learning #LumosBench

2025년 11월 9일

[논문리뷰] π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

본 논문은 π0 및 π0.5와 같은 플로우 기반(Flow-based) VLA (Vision-Language-Action) 모델 에 대규모 RL을 적용할 때 발생하는 액션 로그-우도(log-likelihood) 계산의 난해함 을 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning (RL)#Vision-Language-Action Models (VLAs)#Flow-based Models #Policy Optimization #Robotics #Flow Matching #SDE #MDP

2025년 11월 9일

[논문리뷰] CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

조건부 생성 모델에서 속도 네트워크가 데이터 분포의 질량 이동(mass transport) 과 조건 정보 인코딩(conditional injection) 이라는 두 가지 과제를 동시에 처리해야 하는 부담을 완화하는 것이 주요 목표입니다. 이를 통해 모델 학습을 가속화하고 생성 품질을 향상시키고자 합니다.

#Review #Flow Matching #Conditional Generative Models #Reparameterization #Mode Collapse #Image Generation #Latent Space Alignment #Diffusion Models

2025년 9월 24일

[논문리뷰] DiffusionNFT: Online Diffusion Reinforcement with Forward Process

본 논문은 확산 모델의 온라인 강화 학습(RL) 적용 시 발생하는 고유한 문제점, 즉 다루기 어려운 가능도(likelihoods)와 역방향 샘플링 과정의 제약사항을 해결하는 것을 목표로 합니다.

#Review #Diffusion Models #Reinforcement Learning #Online RL #Flow Matching #Forward Process #CFG-free #Image Generation #Negative-Aware FineTuning

2025년 9월 23일

[논문리뷰] Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

본 논문은 생성 모델링(Generative Modeling) , 표현 학습(Representation Learning) , 분류(Classification) 라는 세 가지 핵심 ML 태스크를 단일 통합 원칙으로 해결하는 것을 목표로 합니다.

#Review #Generative Modeling #Representation Learning #Classification #Unified Framework #Latent Space #Flow Matching #Deep Learning #Image Generation

2025년 9월 22일

[논문리뷰] From Editor to Dense Geometry Estimator

본 논문은 기존의 텍스트-투-이미지(T2I) 생성 모델보다 Diffusion Transformer (DiT) 기반의 이미지 편집 모델이 단안 밀집 기하학 추정(depth 및 normal) 작업에 더 적합한 파운데이션 모델임을 증명하고, 이를 기반으로 FE2E 라는 새로운 프레임워크를 개발하여 제한된 훈련 데이터로도 뛰어난 제로샷 성능을 달성하는 것을 목표로 합니다.

#Review #Dense Geometry Estimation #Diffusion Transformer #Image Editing #Zero-shot Learning #Depth Estimation #Normal Estimation #Flow Matching #Logarithmic Quantization

2025년 9월 5일

[논문리뷰] EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

본 연구는 기존 VLA 모델들이 가진 제한된 도메인 및 유연성 문제를 해결하고, 개방형 환경에서 인간 수준의 유연한 다중 모달 추론 및 물리적 상호작용 을 가능하게 하는 일반ist 로봇 제어를 목표로 합니다.

#Review #Embodied AI #Robot Control #Vision-Language-Action Models #Multimodal Pretraining #Flow Matching #Foundation Models #Generalization #Real-world Robotics

2025년 9월 1일

[논문리뷰] OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

논문은 마스크 기반 이미지 편집(Image Fill, Extend, Object Removal, Text Rendering)의 다양한 하위 태스크에서 기존 모델들의 제한적인 범용성과 태스크별 지도 학습 미세 조정(SFT) 의 비효율성을 해결하고자 합니다.

#Review #Image Generation #Mask-Guided Editing #Reinforcement Learning #Human Preference Learning #Vision-Language Models #Multi-Task Learning #Flow Matching

2025년 8월 29일

[논문리뷰] TempFlow-GRPO: When Timing Matters for GRPO in Flow Models

텍스트-투-이미지 플로우 매칭 모델의 GRPO(Generalized Policy Rejection Optimization) 훈련이 시간적 균일성 가정 과 중간 피드백 신호 부족 으로 인해 인간 선호도 정렬에 비효율적인 문제를 해결하는 것이 목표입니다.

#Review #Flow Matching #Reinforcement Learning #Human Preference Alignment #GRPO #Temporal Credit Assignment #Generative AI #Text-to-Image

2025년 8월 20일

[논문리뷰] NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

이 논문은 텍스트-이미지 생성 분야에서 기존 autoregressive (AR) 모델이 직면한 양자화 손실 및 무거운 확산 모델 의존성 의 한계를 극복하고자 합니다.

#Review #Autoregressive Models #Text-to-Image Generation #Continuous Latent Tokens #Flow Matching #Image Editing #Multimodal Learning #Transformer Architecture

2025년 8월 15일

[논문리뷰] Marco-Voice Technical Report

본 논문은 음성 복제(voice cloning)와 감정 제어(emotion control)를 통합한 다기능 음성 합성 시스템 인 Marco-Voice 를 개발하는 것을 목표로 합니다.

#Review #Speech Synthesis #Voice Cloning #Emotion Control #Text-to-Speech #Disentanglement #Contrastive Learning #Flow Matching #Emotional Speech Dataset

2025년 8월 8일

[논문리뷰] SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

본 논문은 과도한 잔향, 왜곡, 클리핑, 음색 불균형 등 다양한 오디오 품질 문제를 해결하는 통합적이고 텍스트 제어 가능한 음악 복원 및 마스터링 모델 을 개발하는 것을 목표로 합니다.

#Review #Music Restoration #Audio Mastering #Generative Models #Flow Matching #Text-to-Audio #Audio Quality Enhancement #Multi-task Learning #Dataset Creation

2025년 8월 7일

[논문리뷰] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

본 논문은 로봇이 실제 환경에서 효과적으로 작동하기 위해 멀티모달 추론과 정확한 동작 생성을 통합하는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Instruction Tuning #Multimodal Reasoning #Robotic Manipulation #Catastrophic Forgetting #Mixture-of-Experts (MoE)#Flow Matching

2025년 8월 5일

[논문리뷰] MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

기존 텍스트-투-이미지(T2I) 모델이 대규모 비정제 데이터셋에서 학습되어 사용자 선호도와 잘 맞지 않고, 후처리 방식의 보상 모델(reward model)이 정보 손실과 비효율성을 야기하는 문제를 해결하고자 합니다.

#Review #Text-to-Image Generation #Multi-Reward Learning #Flow Matching #User Preference Alignment #Training Efficiency #Compositional Reasoning #Conditional Generation

2025년 10월 31일

[논문리뷰] EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation

컴퓨테이셔널 단백질 엔지니어링에서 기질 특이적 기능성을 가진 효소 백본을 설계하는 핵심 과제를 해결하고자 합니다. 기존 생성 모델들이 바인딩 데이터, 기질 특이적 제어, 및 de novo 효소 백본 생성 유연성에서 한계를 보이는 문제점을 극복하는 것을 목표로 합니다.

#Review #Enzyme Design #Protein Engineering #Generative Models #Flow Matching #Substrate-Specific Control #Functional Site Prediction #Biomolecular AI #Deep Learning

2025년 10월 31일

[논문리뷰] The Principles of Diffusion Models

본 논문(모노그래프)은 확산 모델(Diffusion Models)의 근본적인 원리를 심층적으로 분석하고, 다양한 정식화(formulations)들이 어떻게 공통된 수학적 아이디어에서 파생되었는지 추적하여 통일된 관점을 제시하는 것을 목표로 합니다.

#Review #Diffusion Models #Generative AI #Variational Autoencoder #Energy-Based Models #Normalizing Flows #Score-Based SDEs #Flow Matching #Fokker-Planck Equation

2025년 10월 30일

[논문리뷰] Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

본 논문은 Mixture-of-Experts(MoE)를 Diffusion Transformers(DiTs)에 적용할 때 발생하는 제한적인 성능 향상 문제를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Diffusion Transformers (DiTs)#Routing Guidance #Semantic Specialization #Contrastive Learning #Image Generation #Flow Matching

2025년 10월 29일

[논문리뷰] Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation

이미지 자기회귀(AR) 모델 의 느린 샘플링 속도 문제를 해결하고, 특히 원스텝 샘플링 시 발생하는 성능 저하 및 Distilled Decoding 1 (DD1) 의 사전 정의된 매핑 의존성 한계를 극복하는 것을 목표로 합니다.

#Review #Auto-regressive Models #Image Generation #One-step Sampling #Model Distillation #Conditional Score Distillation #Flow Matching #Generative Models

2025년 10월 28일

[논문리뷰] ACG: Action Coherence Guidance for Flow-based VLA models

본 논문은 모방 학습을 통해 훈련된 Vision-Language-Action (VLA) 모델, 특히 Diffusion 및 Flow Matching 모델 에서 발생하는 액션 불일치(jerks, pauses, jitter) 문제를 해결하여 안정성과 궤적 드리프트로 인한 정밀 조작 실패를 방지하는 것을 목표로 합니다.

#Review #Action Coherence #Flow Matching #VLA Models #Guidance #Robotics #Imitation Learning #Transformer #Self-Attention

2025년 10월 28일

[논문리뷰] Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

본 논문은 flow-matching 기반 T2I(Text-to-Image) 생성 에서 GRPO(Group Relative Policy Optimization)의 두 가지 주요 한계, 즉 불정확한 이점 귀인(inaccurate advantage attribution) 과 생성 과정의 시간적 역학(temporal dynamics) 무시 를 해결하는 것을 목표로 합니다.

#Review #Text-to-Image Generation #Reinforcement Learning #GRPO #Flow Matching #Chunk-level Optimization #Temporal Dynamics #Diffusion Models

2025년 10월 27일

[논문리뷰] AlphaFlow: Understanding and Improving MeanFlow Models

본 논문은 MeanFlow 모델의 성공 원리를 심층적으로 분석하고, MeanFlow 훈련 목표 내에 존재하는 trajectory flow matching 및 trajectory consistency 두 구성 요소 간의 음의 상관관계 로 인한 최적화 충돌 및 수렴 지연 문제를 해결하는 것을 목표로 합니다.

#Review #Generative Models #Flow Matching #Consistency Models #MeanFlow #Curriculum Learning #Few-Step Generation #Image Generation

2025년 10월 24일

[논문리뷰] pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

이 논문은 기존 few-step 확산 및 흐름 기반 생성 모델의 증류 과정에서 발생하는 품질-다양성 트레이드오프 와 복잡한 훈련 절차 문제를 해결하고자 합니다.

#Review #Diffusion Models #Flow Matching #Generative Models #Model Distillation #Imitation Learning #Few-Step Generation #Policy-Based AI #Text-to-Image

2025년 10월 17일

[논문리뷰] X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

다양한 로봇 플랫폼과 이질적인 데이터셋 전반에서 효과적인 훈련을 통해 일반화된 Vision-Language-Action (VLA) 모델을 구축하는 것이 목표입니다.

#Review #Vision-Language-Action (VLA) Models #Soft Prompts #Transformer #Cross-Embodiment #Robotics #Pretraining #Domain Adaptation #Flow Matching

2025년 10월 16일

[논문리뷰] OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

이 논문은 오토회귀(AR) 모델 의 엄격한 순차적 생성과 확산(Diffusion) 모델 의 고정 길이 생성이라는 근본적인 한계를 극복하는 것을 목표로 합니다.

#Review #Non-Autoregressive #Multimodal Generation #Edit Flows #Flow Matching #Interleaved Generation #Text-to-Image Synthesis #Unified Models

2025년 10월 8일

[논문리뷰] Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

기존 확산(Diffusion) 및 플로우(Flow) 기반 생성 모델의 비평형, 시간-조건부 동역학 의 한계를 극복하고, 단일 시간 불변 평형 기울기 를 학습하는 새로운 생성 모델링 프레임워크인 Equilibrium Matching (EqM) 을 제안하는 것이 목표입니다.

#Review #Generative Models #Equilibrium Dynamics #Energy-Based Models (EBMs)#Flow Matching #Diffusion Models #Optimization-Based Sampling #Image Generation

2025년 10월 8일

[논문리뷰] Deforming Videos to Masks: Flow Matching for Referring Video Segmentation

기존 Referring Video Object Segmentation (RVOS) 패러다임인 'locate-then-segment' 방식이 정보 병목 현상과 시간적 일관성 부족으로 복잡한 언어 및 동적 비디오 처리에 한계를 보이는 문제를 해결하는 것입니다.

#Review #Referring Video Object Segmentation #Flow Matching #Video Segmentation #Generative Models #Text-to-Video #Continuous Flow #Diffusion Models

2025년 10월 8일