본문으로 건너뛰기

#Diffusion Models

390개의 포스트

[논문리뷰] YoCausal: How Far is Video Generation from World Model? A Causality Perspective

댓글 수 로딩 중

[논문리뷰] Colored Noise Diffusion Sampling

댓글 수 로딩 중

[논문리뷰] CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation

댓글 수 로딩 중

[논문리뷰] DrawMotion: Generating 3D Human Motions by Freehand Drawing

댓글 수 로딩 중

[논문리뷰] Video Models Can Reason with Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

댓글 수 로딩 중

[논문리뷰] From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing

댓글 수 로딩 중

[논문리뷰] RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

댓글 수 로딩 중

[논문리뷰] Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

댓글 수 로딩 중

[논문리뷰] Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

댓글 수 로딩 중

[논문리뷰] RewardFlow: Generate Images by Optimizing What You Reward

댓글 수 로딩 중

[논문리뷰] Lighting-grounded Video Generation with Renderer-based Agent Reasoning

댓글 수 로딩 중

[논문리뷰] FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

댓글 수 로딩 중

[논문리뷰] Representation Alignment for Just Image Transformers is not Easier than You Think

댓글 수 로딩 중

[논문리뷰] RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

댓글 수 로딩 중

[논문리뷰] DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

댓글 수 로딩 중

[논문리뷰] Repurposing Geometric Foundation Models for Multi-view Diffusion

댓글 수 로딩 중

[논문리뷰] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

댓글 수 로딩 중

[논문리뷰] EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

댓글 수 로딩 중

[논문리뷰] Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

댓글 수 로딩 중

[논문리뷰] From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

댓글 수 로딩 중

[논문리뷰] Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

댓글 수 로딩 중

[논문리뷰] TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

댓글 수 로딩 중

[논문리뷰] Scale Space Diffusion

댓글 수 로딩 중

[논문리뷰] HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising

댓글 수 로딩 중

[논문리뷰] CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing

댓글 수 로딩 중

[논문리뷰] WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching

댓글 수 로딩 중

[논문리뷰] WildActor: Unconstrained Identity-Preserving Video Generation

댓글 수 로딩 중

[논문리뷰] Physical Simulator In-the-Loop Video Generation

댓글 수 로딩 중

[논문리뷰] RealWonder: Real-Time Physical Action-Conditioned Video Generation

댓글 수 로딩 중

[논문리뷰] HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

댓글 수 로딩 중

[논문리뷰] DreamWorld: Unified World Modeling in Video Generation

댓글 수 로딩 중

[논문리뷰] CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

댓글 수 로딩 중

[논문리뷰] Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

댓글 수 로딩 중

[논문리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining

댓글 수 로딩 중

[논문리뷰] WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

댓글 수 로딩 중

[논문리뷰] From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

댓글 수 로딩 중

[논문리뷰] SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

댓글 수 로딩 중

[논문리뷰] Causal Motion Diffusion Models for Autoregressive Motion Generation

댓글 수 로딩 중

[논문리뷰] Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

댓글 수 로딩 중

[논문리뷰] SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models

댓글 수 로딩 중

[논문리뷰] Image Generation with a Sphere Encoder

댓글 수 로딩 중

[논문리뷰] One-step Language Modeling via Continuous Denoising

댓글 수 로딩 중

[논문리뷰] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

댓글 수 로딩 중

[논문리뷰] Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

댓글 수 로딩 중

[논문리뷰] FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

댓글 수 로딩 중

[논문리뷰] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

댓글 수 로딩 중

[논문리뷰] dVoting: Fast Voting for dLLMs

댓글 수 로딩 중

[논문리뷰] Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

댓글 수 로딩 중

[논문리뷰] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

댓글 수 로딩 중

[논문리뷰] Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

댓글 수 로딩 중

[논문리뷰] WorldCompass: Reinforcement Learning for Long-Horizon World Models

댓글 수 로딩 중

[논문리뷰] Context Forcing: Consistent Autoregressive Video Generation with Long Context

댓글 수 로딩 중

[논문리뷰] Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

댓글 수 로딩 중

[논문리뷰] Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

댓글 수 로딩 중

[논문리뷰] PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

댓글 수 로딩 중

[논문리뷰] Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

댓글 수 로딩 중

[논문리뷰] Revisiting Diffusion Model Predictions Through Dimensionality

댓글 수 로딩 중

[논문리뷰] DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

댓글 수 로딩 중

[논문리뷰] DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

댓글 수 로딩 중

[논문리뷰] SkyReels-V3 Technique Report

댓글 수 로딩 중

[논문리뷰] VideoMaMa: Mask-Guided Video Matting via Generative Prior

댓글 수 로딩 중

[논문리뷰] CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

댓글 수 로딩 중

[논문리뷰] Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

댓글 수 로딩 중

[논문리뷰] CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

댓글 수 로딩 중

[논문리뷰] Alterbute: Editing Intrinsic Attributes of Objects in Images

댓글 수 로딩 중

[논문리뷰] Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

댓글 수 로딩 중

[논문리뷰] End-to-End Video Character Replacement without Structural Guidance

댓글 수 로딩 중

[논문리뷰] RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

댓글 수 로딩 중

[논문리뷰] Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

댓글 수 로딩 중

[논문리뷰] DreamStyle: A Unified Framework for Video Stylization

댓글 수 로딩 중

[논문리뷰] Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

댓글 수 로딩 중

[논문리뷰] M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

댓글 수 로딩 중

[논문리뷰] GARDO: Reinforcing Diffusion Models without Reward Hacking

댓글 수 로딩 중

[논문리뷰] Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

댓글 수 로딩 중

[논문리뷰] Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

댓글 수 로딩 중

[논문리뷰] On the Role of Discreteness in Diffusion LLMs

댓글 수 로딩 중

[논문리뷰] DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

댓글 수 로딩 중

[논문리뷰] UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

댓글 수 로딩 중

[논문리뷰] DreamOmni3: Scribble-based Editing and Generation

댓글 수 로딩 중

[논문리뷰] Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

댓글 수 로딩 중

[논문리뷰] Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

댓글 수 로딩 중

[논문리뷰] ProEdit: Inversion-based Editing From Prompts Done Right

댓글 수 로딩 중

[논문리뷰] InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

댓글 수 로딩 중

[논문리뷰] Spatia: Video Generation with Updatable Spatial Memory

댓글 수 로딩 중

[논문리뷰] TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

댓글 수 로딩 중

[논문리뷰] SemanticGen: Video Generation in Semantic Space

댓글 수 로딩 중

[논문리뷰] Region-Constraint In-Context Generation for Instructional Video Editing

댓글 수 로딩 중

[논문리뷰] MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

댓글 수 로딩 중

[논문리뷰] Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

댓글 수 로딩 중

[논문리뷰] The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

댓글 수 로딩 중

[논문리뷰] StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

댓글 수 로딩 중

[논문리뷰] RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

댓글 수 로딩 중

[논문리뷰] FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering

댓글 수 로딩 중

[논문리뷰] FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction

댓글 수 로딩 중

[논문리뷰] Robust and Calibrated Detection of Authentic Multimedia Content

댓글 수 로딩 중

[논문리뷰] ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

댓글 수 로딩 중

[논문리뷰] Towards Interactive Intelligence for Digital Humans

댓글 수 로딩 중

[논문리뷰] V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

댓글 수 로딩 중

[논문리뷰] Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

댓글 수 로딩 중

[논문리뷰] PersonaLive! Expressive Portrait Image Animation for Live Streaming

댓글 수 로딩 중

[논문리뷰] Exploring MLLM-Diffusion Information Transfer with MetaCanvas

댓글 수 로딩 중

[논문리뷰] ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning

댓글 수 로딩 중

[논문리뷰] H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos

댓글 수 로딩 중

[논문리뷰] VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

댓글 수 로딩 중

[논문리뷰] Composing Concepts from Images and Videos via Concept-prompt Binding

댓글 수 로딩 중

[논문리뷰] TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

댓글 수 로딩 중

[논문리뷰] Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality

댓글 수 로딩 중

[논문리뷰] OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

댓글 수 로딩 중

[논문리뷰] MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

댓글 수 로딩 중

[논문리뷰] UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

댓글 수 로딩 중

[논문리뷰] Scaling Zero-Shot Reference-to-Video Generation

댓글 수 로딩 중

[논문리뷰] ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

댓글 수 로딩 중

[논문리뷰] EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

댓글 수 로딩 중

[논문리뷰] RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

댓글 수 로딩 중

[논문리뷰] NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

댓글 수 로딩 중

[논문리뷰] Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

댓글 수 로딩 중

[논문리뷰] LATTICE: Democratize High-Fidelity 3D Generation at Scale

댓글 수 로딩 중

[논문리뷰] Generative Neural Video Compression via Video Diffusion Prior

댓글 수 로딩 중

[논문리뷰] BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

댓글 수 로딩 중

[논문리뷰] RELIC: Interactive Video World Model with Long-Horizon Memory

댓글 수 로딩 중

[논문리뷰] CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

댓글 수 로딩 중

[논문리뷰] Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

댓글 수 로딩 중

[논문리뷰] MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

댓글 수 로딩 중

[논문리뷰] Glance: Accelerating Diffusion Models with 1 Sample

댓글 수 로딩 중

[논문리뷰] DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

댓글 수 로딩 중

[논문리뷰] Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

댓글 수 로딩 중

[논문리뷰] Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation

댓글 수 로딩 중

[논문리뷰] What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

댓글 수 로딩 중

[논문리뷰] OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

댓글 수 로딩 중

[논문리뷰] Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

댓글 수 로딩 중

[논문리뷰] AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

댓글 수 로딩 중

[논문리뷰] MIRA: Multimodal Iterative Reasoning Agent for Image Editing

댓글 수 로딩 중

[논문리뷰] Canvas-to-Image: Compositional Image Generation with Multimodal Controls

댓글 수 로딩 중

[논문리뷰] Block Cascading: Training Free Acceleration of Block-Causal Video Models

댓글 수 로딩 중

[논문리뷰] PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding

댓글 수 로딩 중

[논문리뷰] MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

댓글 수 로딩 중

[논문리뷰] DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

댓글 수 로딩 중

[논문리뷰] UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

댓글 수 로딩 중

[논문리뷰] SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

댓글 수 로딩 중

[논문리뷰] Controllable Layer Decomposition for Reversible Multi-Layer Image Generation

댓글 수 로딩 중

[논문리뷰] Taming Generative Synthetic Data for X-ray Prohibited Item Detection

댓글 수 로딩 중

[논문리뷰] Planning with Sketch-Guided Verification for Physics-Aware Video Generation

댓글 수 로딩 중

[논문리뷰] Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

댓글 수 로딩 중

[논문리뷰] A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

댓글 수 로딩 중

[논문리뷰] EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation

댓글 수 로딩 중

[논문리뷰] Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions

댓글 수 로딩 중

[논문리뷰] DIMO: Diverse 3D Motion Generation for Arbitrary Objects

댓글 수 로딩 중

[논문리뷰] EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

댓글 수 로딩 중

[논문리뷰] Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation

댓글 수 로딩 중

[논문리뷰] Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

댓글 수 로딩 중

[논문리뷰] UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

댓글 수 로딩 중

[논문리뷰] MotionStream: Real-Time Video Generation with Interactive Motion Controls

댓글 수 로딩 중

[논문리뷰] Beyond Objects: Contextual Synthetic Data Generation for Fine-Grained Classification

댓글 수 로딩 중

[논문리뷰] X-Streamer: Unified Human World Modeling with Audiovisual Interaction

댓글 수 로딩 중

[논문리뷰] WoW: Towards a World omniscient World model Through Embodied Interaction

댓글 수 로딩 중

[논문리뷰] Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation

댓글 수 로딩 중

[논문리뷰] LongLive: Real-time Interactive Long Video Generation

댓글 수 로딩 중

[논문리뷰] HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models

댓글 수 로딩 중

[논문리뷰] FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

댓글 수 로딩 중

[논문리뷰] Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

댓글 수 로딩 중

[논문리뷰] Does FLUX Already Know How to Perform Physically Plausible Image Composition?

댓글 수 로딩 중

[논문리뷰] PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

댓글 수 로딩 중

[논문리뷰] CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

댓글 수 로딩 중

[논문리뷰] OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

댓글 수 로딩 중

[논문리뷰] SPATIALGEN: Layout-guided 3D Indoor Scene Generation

댓글 수 로딩 중

[논문리뷰] Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

댓글 수 로딩 중

[논문리뷰] LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

댓글 수 로딩 중

[논문리뷰] InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

댓글 수 로딩 중

[논문리뷰] FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies

댓글 수 로딩 중

[논문리뷰] HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

댓글 수 로딩 중

[논문리뷰] UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

댓글 수 로딩 중

[논문리뷰] Interleaving Reasoning for Better Text-to-Image Generation

댓글 수 로딩 중

[논문리뷰] LuxDiT: Lighting Estimation with Video Diffusion Transformer

댓글 수 로딩 중

[논문리뷰] Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

댓글 수 로딩 중

[논문리뷰] MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

댓글 수 로딩 중

[논문리뷰] FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

댓글 수 로딩 중

[논문리뷰] USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

댓글 수 로딩 중

[논문리뷰] MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

댓글 수 로딩 중

[논문리뷰] AudioStory: Generating Long-Form Narrative Audio with Large Language Models

댓글 수 로딩 중

[논문리뷰] Wan-S2V: Audio-Driven Cinematic Video Generation

댓글 수 로딩 중

[논문리뷰] VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

댓글 수 로딩 중

[논문리뷰] SpotEdit: Evaluating Visually-Guided Image Editing Methods

댓글 수 로딩 중

[논문리뷰] MV-RAG: Retrieval Augmented Multiview Diffusion

댓글 수 로딩 중

[논문리뷰] SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

댓글 수 로딩 중

[논문리뷰] Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

댓글 수 로딩 중

[논문리뷰] Precise Action-to-Video Generation Through Visual Action Prompts

댓글 수 로딩 중

[논문리뷰] Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

댓글 수 로딩 중

[논문리뷰] FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

댓글 수 로딩 중

[논문리뷰] Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

댓글 수 로딩 중

[논문리뷰] Matrix-3D: Omnidirectional Explorable 3D World Generation

댓글 수 로딩 중

[논문리뷰] CharacterShot: Controllable and Consistent 4D Character Animation

댓글 수 로딩 중

[논문리뷰] Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control

댓글 수 로딩 중

[논문리뷰] LightSwitch: Multi-view Relighting with Material-guided Diffusion

댓글 수 로딩 중

[논문리뷰] Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression

댓글 수 로딩 중

[논문리뷰] The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

댓글 수 로딩 중

[논문리뷰] Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

댓글 수 로딩 중

[논문리뷰] Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

댓글 수 로딩 중

[논문리뷰] Multi-human Interactive Talking Dataset

댓글 수 로딩 중

[논문리뷰] LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

댓글 수 로딩 중

[논문리뷰] LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

댓글 수 로딩 중

[논문리뷰] The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

댓글 수 로딩 중

[논문리뷰] FullPart: Generating each 3D Part at Full Resolution

댓글 수 로딩 중

[논문리뷰] The Principles of Diffusion Models

댓글 수 로딩 중

[논문리뷰] RegionE: Adaptive Region-Aware Generation for Efficient Image Editing

댓글 수 로딩 중

[논문리뷰] UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

댓글 수 로딩 중

[논문리뷰] EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

댓글 수 로딩 중

[논문리뷰] Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

댓글 수 로딩 중

[논문리뷰] Learning an Image Editing Model without Image Editing Pairs

댓글 수 로딩 중

[논문리뷰] DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

댓글 수 로딩 중

[논문리뷰] PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

댓글 수 로딩 중

[논문리뷰] FlashWorld: High-quality 3D Scene Generation within Seconds

댓글 수 로딩 중

[논문리뷰] CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

댓글 수 로딩 중

[논문리뷰] Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

댓글 수 로딩 중

[논문리뷰] FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

댓글 수 로딩 중

[논문리뷰] Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

댓글 수 로딩 중

[논문리뷰] UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

댓글 수 로딩 중

[논문리뷰] Fidelity-Aware Data Composition for Robust Robot Generalization

댓글 수 로딩 중

[논문리뷰] WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

댓글 수 로딩 중

[논문리뷰] StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation

댓글 수 로딩 중

[논문리뷰] OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

댓글 수 로딩 중

[논문리뷰] LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation

댓글 수 로딩 중

[논문리뷰] Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

댓글 수 로딩 중

[논문리뷰] Deforming Videos to Masks: Flow Matching for Referring Video Segmentation

댓글 수 로딩 중

[논문리뷰] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

댓글 수 로딩 중

[논문리뷰] Factuality Matters: When Image Generation and Editing Meet Structured Visuals

댓글 수 로딩 중

[논문리뷰] ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

댓글 수 로딩 중

[논문리뷰] Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

댓글 수 로딩 중

[논문리뷰] RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling

댓글 수 로딩 중

[논문리뷰] Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

댓글 수 로딩 중

[논문리뷰] HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

댓글 수 로딩 중

[논문리뷰] DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models

댓글 수 로딩 중

[논문리뷰] Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

댓글 수 로딩 중

[논문리뷰] PICABench: How Far Are We from Physically Realistic Image Editing?

댓글 수 로딩 중

[논문리뷰] Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

댓글 수 로딩 중

[논문리뷰] Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

댓글 수 로딩 중

[논문리뷰] LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

댓글 수 로딩 중

[논문리뷰] Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation

댓글 수 로딩 중

[논문리뷰] BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

댓글 수 로딩 중

[논문리뷰] MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

댓글 수 로딩 중

[논문리뷰] MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification

댓글 수 로딩 중

[논문리뷰] DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

댓글 수 로딩 중