본문으로 건너뛰기

#Image Generation

108개의 포스트

[논문리뷰] GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

댓글 수 로딩 중

[논문리뷰] UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

댓글 수 로딩 중

[논문리뷰] Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

댓글 수 로딩 중

[논문리뷰] ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

댓글 수 로딩 중

[논문리뷰] Gen-Searcher: Reinforcing Agentic Search for Image Generation

댓글 수 로딩 중

[논문리뷰] Representation Alignment for Just Image Transformers is not Easier than You Think

댓글 수 로딩 중

[논문리뷰] WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

댓글 수 로딩 중

[논문리뷰] Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

댓글 수 로딩 중

[논문리뷰] UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations

댓글 수 로딩 중

[논문리뷰] InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

댓글 수 로딩 중

[논문리뷰] Dynamic Chunking Diffusion Transformer

댓글 수 로딩 중

[논문리뷰] Enhancing Spatial Understanding in Image Generation via Reward Modeling

댓글 수 로딩 중

[논문리뷰] SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models

댓글 수 로딩 중

[논문리뷰] Image Generation with a Sphere Encoder

댓글 수 로딩 중

[논문리뷰] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

댓글 수 로딩 중

[논문리뷰] UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model

댓글 수 로딩 중

[논문리뷰] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

댓글 수 로딩 중

[논문리뷰] Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

댓글 수 로딩 중

[논문리뷰] Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

댓글 수 로딩 중

[논문리뷰] Balancing Understanding and Generation in Discrete Diffusion Models

댓글 수 로딩 중

[논문리뷰] PaperBanana: Automating Academic Illustration for AI Scientists

댓글 수 로딩 중

[논문리뷰] DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

댓글 수 로딩 중

[논문리뷰] OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

댓글 수 로딩 중

[논문리뷰] UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

댓글 수 로딩 중

[논문리뷰] Boosting Latent Diffusion Models via Disentangled Representation Alignment

댓글 수 로딩 중

[논문리뷰] E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

댓글 수 로딩 중

[논문리뷰] DreamOmni3: Scribble-based Editing and Generation

댓글 수 로딩 중

[논문리뷰] Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

댓글 수 로딩 중

[논문리뷰] A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

댓글 수 로딩 중

[논문리뷰] Exploring MLLM-Diffusion Information Transfer with MetaCanvas

댓글 수 로딩 중

[논문리뷰] VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

댓글 수 로딩 중

[논문리뷰] Rethinking Training Dynamics in Scale-wise Autoregressive Generation

댓글 수 로딩 중

[논문리뷰] LongCat-Image Technical Report

댓글 수 로딩 중

[논문리뷰] Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

댓글 수 로딩 중

[논문리뷰] Glance: Accelerating Diffusion Models with 1 Sample

댓글 수 로딩 중

[논문리뷰] The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

댓글 수 로딩 중

[논문리뷰] The Collapse of Patches

댓글 수 로딩 중

[논문리뷰] From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

댓글 수 로딩 중

[논문리뷰] Architecture Decoupling Is Not All You Need For Unified Multimodal Model

댓글 수 로딩 중

[논문리뷰] Canvas-to-Image: Compositional Image Generation with Multimodal Controls

댓글 수 로딩 중

[논문리뷰] DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

댓글 수 로딩 중

[논문리뷰] Diversity Has Always Been There in Your Visual Autoregressive Models

댓글 수 로딩 중

[논문리뷰] Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

댓글 수 로딩 중

[논문리뷰] One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

댓글 수 로딩 중

[논문리뷰] Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

댓글 수 로딩 중

[논문리뷰] RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

댓글 수 로딩 중

[논문리뷰] OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

댓글 수 로딩 중

[논문리뷰] HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models

댓글 수 로딩 중

[논문리뷰] Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

댓글 수 로딩 중

[논문리뷰] CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

댓글 수 로딩 중

[논문리뷰] Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

댓글 수 로딩 중

[논문리뷰] Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

댓글 수 로딩 중

[논문리뷰] Reconstruction Alignment Improves Unified Multimodal Models

댓글 수 로딩 중

[논문리뷰] Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

댓글 수 로딩 중

[논문리뷰] OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

댓글 수 로딩 중

[논문리뷰] Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

댓글 수 로딩 중

[논문리뷰] Emu3.5: Native Multimodal Models are World Learners

댓글 수 로딩 중

[논문리뷰] Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation

댓글 수 로딩 중

[논문리뷰] Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

댓글 수 로딩 중

[논문리뷰] Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

댓글 수 로딩 중

[논문리뷰] Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

댓글 수 로딩 중

[논문리뷰] Heptapod: Language Modeling on Visual Signals

댓글 수 로딩 중

[논문리뷰] Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

댓글 수 로딩 중

[논문리뷰] Factuality Matters: When Image Generation and Editing Meet Structured Visuals

댓글 수 로딩 중

[논문리뷰] AlphaFlow: Understanding and Improving MeanFlow Models

댓글 수 로딩 중

[논문리뷰] ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

댓글 수 로딩 중

[논문리뷰] Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

댓글 수 로딩 중

[논문리뷰] BLIP3o-NEXT: Next Frontier of Native Image Generation

댓글 수 로딩 중