#Latent Space

35개의 포스트

[논문리뷰] Latent-Identity Tuning in Text-to-Image Personalization Models

본 논문은 기존의 Text-to-Image personalization 모델이 특정 개인의 정체성을 재현하는 데에는 뛰어나지만, 생성된 정체성을 세밀하게 수정하거나 제어하는 기능이 결여되어 있다는 점을 해결하고자 합니다 .

#Review #Text-to-Image #Personalization #Identity Tuning #Latent Space #Q-Former #Fine-grained Editing

2026년 7월 13일

[논문리뷰] MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

본 논문은 기존 비디오 생성 모델들이 직면한 긴 영상 생성 시 발생하는 Temporal Inconsistency와 정보의 누락 문제를 해결하고자 합니다. 기존의 프레임 단위 또는 짧은 세그먼트 기반의 생성 방식은 시간이 지날수록 전역적 구조를 잃어버리는 한계가 있습니다.

#Review #Video Generation #Hierarchical Latents #Long-Range Consistency #Diffusion Models #Latent Space #Spatiotemporal Modeling

2026년 6월 9일

[논문리뷰] Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

본 연구는 TTS 언어 모델의 내부 동작이 '블랙박스'로 남아있어, 특정 음성 속성을 정교하게 제어하기 어렵다는 문제를 해결합니다. 기존의 음성 모델은 특정 스타일이나 화자 변환을 위해 전체 모델을 재학습하거나 프롬프트 엔지니어링에 의존해야 하며, 이는 제어의 정밀도와 효율성 측면에서 한계가 있습니다.

#Review #Sparse Autoencoders #Text-to-Speech #Mechanistic Interpretability #Latent Space #Controllable Generation

2026년 6월 9일

[논문리뷰] Latent Spatial Memory for Video World Models

본 논문은 기존 비디오 월드 모델이 가진 3D 공간 일관성 유지의 한계와 과도한 계산 비용 문제를 해결하기 위해 Mirage를 제안한다.

#Review #Video Generation #Spatial Memory #3D-consistent Video Generation #Video World Models #Latent Space #Diffusion Models

2026년 6월 8일

[논문리뷰] GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

본 연구는 고전적인 Multi-view Reconstruction 기법이 복잡한 조명 환경이나 희소한 시점(Sparse view) 조건에서 구조적 붕괴를 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #3D Scene Reconstruction #Generative Priors #Multi-View Stereo #Diffusion Models #Neural Rendering #Latent Space

2026년 5월 24일

[논문리뷰] ChangeFlow -- Latent Rectified Flow for Change Detection in Remote Sensing

본 논문은 기존의 RSCD 연구들이 주로 픽셀 단위의 결정론적 분류(discriminative classification)에 의존하고 있어, 지역적 일관성 부족과 모호성 처리에 한계가 있다는 점을 지적합니다.

#Review #Remote Sensing Change Detection #Rectified Flow #Generative Models #Latent Space #Diffusion Transformer #Coherence #Confidence Estimation

2026년 5월 17일

[논문리뷰] The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

본 논문은 현대의 언어 기반 모델들이 여전히 토큰 단위의 명시적인 생성 방식에 의존하고 있어, 이로 인한 구조적 한계에 직면해 있다는 점을 지적한다.

#Review #Latent Space #Language-based Models #Implicit Reasoning #Multimodal Computation #Embodied AI #Latent Representation #Machine-native

2026년 4월 2일

[논문리뷰] SegviGen: Repurposing 3D Generative Model for Part Segmentation

기존 3D Part Segmentation 방법론들은 몇 가지 본질적인 한계에 직면해 있습니다.

#Review #3D Part Segmentation #Generative Models #Diffusion Models #Latent Space #Limited Supervision #Multi-Task Learning

2026년 3월 17일

[논문리뷰] LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

본 논문은 강력한 추론 능력을 가진 LLM 기반 dense retriever 가 복잡한 쿼리에 대해 높은 지연 시간 없이 추론 능력을 활용하지 못하는 문제를 해결하고자 합니다.

#Review #Dense Retrieval #LLMs #Reasoning #Knowledge Distillation #Latent Space #Self-Distillation #Chain-of-Thought

2026년 3월 2일

[논문리뷰] Imagination Helps Visual Reasoning, But Not Yet in Latent Space

본 논문은 Multimodal Large Language Models (MLLMs)에서 잠재 공간(latent space)을 활용한 시각적 추론(Latent Visual Reasoning, LVR)의 효과와 내재된 메커니즘을 심층적으로 분석하고, 그 한계를 극복하기 위한 대안적인 접근 방식을 제시하는 것을 목표로 합니다.

#Review #Visual Reasoning #Latent Space #Causal Mediation Analysis #Multimodal LLMs #Textual Imagination #Model Interpretation #Latent Tokens

2026년 2월 26일

[논문리뷰] Causal Motion Diffusion Models for Autoregressive Motion Generation

본 논문은 기존 모션 확산 모델의 인과성 부족과 자기회귀 모델의 불안정성 및 오류 누적 문제를 해결하여, 고품질의 시간적으로 순서가 보장되는(temporally ordered) 모션 생성을 목표로 합니다.

#Review #Motion Generation #Diffusion Models #Autoregressive Models #Causal Modeling #Latent Space #Text-to-Motion #Human Motion Synthesis #Streaming Generation

2026년 2월 26일

[논문리뷰] Image Generation with a Sphere Encoder

기존 확산 모델(diffusion models) 및 자기회귀 모델(autoregressive models)의 느리고 비용이 많이 드는 이미지 생성 방식의 한계를 극복하고, 단 한 번의 순방향 패스(forward pass)만으로도 선명한 이미지를 생성할 수 있는 효율적인 생성 프레임워크를 개발하는 것을 목표로 합니다.

#Review #Image Generation #Sphere Encoder #Autoencoder #Latent Space #Few-Step Generation #Conditional Generation #Diffusion Models #Perceptual Loss

2026년 2월 25일

[논문리뷰] LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

화학 분야의 대규모 언어 모델(LLMs)이 명시적인 자연어 Chain-of-Thought (CoT) 추론에 과도하게 의존하여 발생하는 '연속성-이산성 격차(continuity-discretization gap)' 문제를 해결하고자 합니다.

#Review #Chemical Reasoning #Large Language Models (LLMs)#Chain-of-Thought (CoT)#Latent Space #Molecular Optimization #Inference Efficiency #Reinforcement Learning #Chemical AI

2026년 2월 9일

[논문리뷰] Revisiting Diffusion Model Predictions Through Dimensionality

확산 모델(Diffusion Models)에서 데이터의 내재적 차원(intrinsic dimension) 과 주변 차원(ambient dimension) 에 따라 최적의 예측 대상(prediction target: ε, v, x)이 달라지는 현상에 대한 정량적이고 이론적인 설명 을 제공하고, 예측 대상을 데이터 기반으로 자동으로 학습 하는 방법을 개발하는 것이 주된 목표입니다.

#Review #Diffusion Models #Prediction Target #Dimensionality #Latent Space #Pixel Space #Generative Models #Theoretical Analysis #k-Diff

2026년 2월 1일

[논문리뷰] iFSQ: Improving FSQ for Image Generation with 1 Line of Code

이미지 생성 분야의 Autoregressive(AR) 모델과 Diffusion 모델 간의 단절을 해소하고, 이들을 위한 통일된 토크나이저를 구축 하는 것을 목표로 합니다.

#Review #Finite Scalar Quantization (FSQ)#Image Generation #Autoregressive Models #Diffusion Models #Quantization #Tokenization #Representation Alignment (REPA)#Latent Space

2026년 1월 26일

[논문리뷰] Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

본 논문은 기존 변형 오토인코더(VAE) 의 저차원 잠재 공간이 대규모 텍스트-이미지(T2I) 생성 모델에서 가질 수 있는 한계를 극복하고자 합니다.

#Review #Text-to-Image Generation #Diffusion Models #Representation Autoencoder #Latent Space #Large-Scale Models #Unified Models #Noise Scheduling

2026년 1월 22일

[논문리뷰] Brain-Grounded Axes for Reading and Steering LLM States

본 연구는 LLM(대규모 언어 모델)의 해석 가능성 방향이 종종 외부 접지(external grounding)가 부족하다는 문제에 주목합니다. 이를 해결하기 위해 인간의 뇌 활동을 LLM의 내부 상태를 해석하고 조종하기 위한 안정적이고 외부적으로 접지된 좌표계로 정의하는 것을 목표로 합니다.

#Review #LLM Interpretability #Brain-Grounded AI #MEG #Phase-Locking Value #ICA #LLM Steering #Neural Decoding #Latent Space

2025년 12월 22일

[논문리뷰] Distribution Matching Variational AutoEncoder

본 논문은 시각적 생성 모델에서 VAE 및 파운데이션 모델 인코더가 잠재 공간의 분포를 명시적으로 형성하지 못하는 문제를 해결합니다.

#Review #Variational Autoencoder (VAE)#Distribution Matching #Diffusion Models #Latent Space #Self-supervised Learning (SSL) Features #Generative Models #ImageNet #Tokenizer

2025년 12월 8일

[논문리뷰] World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

본 논문은 최첨단 제어 가능한 비디오 모델이 흔히 겪는 환각 현상과 불확실성 표현 능력 부족 문제를 해결하고자 합니다.

#Review #Controllable Video Generation #Uncertainty Quantification #Video Models #Calibration #Out-of-Distribution Detection #Proper Scoring Rules #Latent Space

2025년 12월 7일

[논문리뷰] REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

소셜 미디어의 가짜 뉴스 확산으로 인한 신뢰 저하 문제를 해결하기 위해, 기존 LLM 기반 팩트 체크 시스템의 외부 지식 의존성, 높은 지연 시간, 환각 현상, 낮은 해석 가능성 등의 한계를 극복하는 것을 목표로 합니다.

#Review #Fact-Checking #Explainable AI (XAI)#Large Language Models (LLMs)#Self-Refinement #Latent Space #Disentanglement #Steering Vectors #Misinformation

2025년 12월 4일

[논문리뷰] Video Generation Models Are Good Latent Reward Models

비디오 생성 모델을 인간의 선호도에 맞춰 정렬하는 Reward Feedback Learning (ReFL) 의 기존 한계, 즉 높은 메모리 사용량, 긴 훈련 시간, 초기 생성 단계 감독 부족 문제를 해결하는 것이 목표입니다.

#Review #Video Generation #Reward Feedback Learning #Latent Space #Diffusion Models #Human Preferences #Motion Quality #Process-aware

2025년 11월 27일

[논문리뷰] Latent Collaboration in Multi-Agent Systems

본 논문은 기존 대규모 언어 모델(LLM) 기반 다중 에이전트 시스템(MAS)이 텍스트 기반 추론 및 통신에 의존하여 발생하는 비효율성과 정보 손실 문제를 해결하는 것을 목표로 합니다.

#Review #Multi-Agent Systems #Large Language Models #Latent Space #Latent Reasoning #Latent Communication #KV Cache #Computational Efficiency #Training-Free

2025년 11월 26일

[논문리뷰] Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs

이 논문은 확산 모델의 생성 효율성을 향상시키기 위한 timestep distillation 의 한계를 극복하고자 합니다.

#Review #Diffusion Models #Timestep Distillation #Consistency Models #Latent Space #Image-Free Training #Efficiency Optimization #Trajectory Sampling #Continuous-Time Learning

2025년 11월 26일

[논문리뷰] I-GLIDE: Input Groups for Latent Health Indicators in Degradation Estimation

본 논문은 복잡한 다중 센서 시스템에서 RUL(Remaining Useful Life) 예측 을 위한 건강 지표(HI)의 질을 향상시키는 것을 목표로 합니다.

#Review #Health Indicator (HI)#Remaining Useful Life (RUL)#Uncertainty Quantification (UQ)#Autoencoder (AE)#Latent Space #Degradation Modeling #Prognostics #Condition-Based Maintenance

2025년 11월 26일

[논문리뷰] One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

본 논문은 기존 확산 모델이 고해상도 이미지를 직접 샘플링할 때 발생하는 속도 저하, 비용 증가, 아티팩트 발생 문제를 해결하고, 사후 픽셀 공간 초해상도(SR) 방식의 추가 지연 및 아티팩트를 극복하는 것을 목표로 합니다.

#Review #Latent Diffusion Models #Super-Resolution #Upscaling Adapter #Image Generation #Latent Space #Multi-scale Learning #Cross-VAE

2025년 11월 13일

[논문리뷰] DIMO: Diverse 3D Motion Generation for Arbitrary Objects

본 연구는 기존 4D 생성 모델이 단일 객체에 대해 단일 모션만 생성하거나, 카테고리별로 제한된 모션만을 다루는 한계를 극복하고자 합니다. 단일 이미지 에서 임의의 객체 에 대한 다양한 3D 모션 을 단일 생성 모델 을 통해 단일 포워드 패스 로 즉시 생성하는 것을 목표로 합니다.

#Review #3D Motion Generation #Generative Models #Arbitrary Objects #Neural Key Points #Latent Space #4D Content Generation #Diffusion Models #3D Gaussian Splatting

2025년 11월 10일

[논문리뷰] SIM-CoT: Supervised Implicit Chain-of-Thought

Implicit Chain-of-Thought (CoT) 모델은 토큰 효율성에도 불구하고, 명시적 CoT 대비 지속적인 성능 격차와 핵심적인 '잠재 불안정성(latent instability)' 문제에 직면해 있습니다.

#Review #Implicit Reasoning #Chain-of-Thought #LLM #Latent Space #Supervised Learning #Model Stability #Interpretability

2025년 9월 25일

[논문리뷰] Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

본 논문은 생성 모델링(Generative Modeling) , 표현 학습(Representation Learning) , 분류(Classification) 라는 세 가지 핵심 ML 태스크를 단일 통합 원칙으로 해결하는 것을 목표로 합니다.

#Review #Generative Modeling #Representation Learning #Classification #Unified Framework #Latent Space #Flow Matching #Deep Learning #Image Generation

2025년 9월 22일

[논문리뷰] InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

본 논문은 기존 확산 모델이 고해상도 이미지 생성 시 해상도에 따라 연산 요구량이 제곱으로 증가 하여 4K 이미지 생성에 100초 이상 이 소요되는 문제점을 해결하고자 합니다.

#Review #Image Synthesis #Resolution-Agnostic #Diffusion Models #Latent Space #VAE Decoder #High-Resolution Image Generation #Generative AI #Transformer Architecture

2025년 9월 15일

[논문리뷰] Collaborative Multi-Modal Coding for High-Quality 3D Generation

본 논문은 기존 3D 생성 모델들이 단일 모달리티(예: RGB 이미지)에 의존하여 훈련 데이터의 범위가 제한되고 멀티모달 데이터의 상호 보완적 이점을 간과하는 문제를 해결하고자 합니다.

#Review #3D Generation #Multi-modal Learning #Diffusion Models #Triplane Representation #Collaborative Coding #Image-to-3D #Latent Space

2025년 8월 29일

[논문리뷰] VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

본 논문은 기존 2D 이미지 기반의 3D 편집 방법론이 겪는 비일관성 및 비정밀성의 한계를 극복하고, 네이티브 3D 잠재 공간 에서 훈련 없이(training-free) 정밀하고 일관성 있는 3D 로컬 편집을 수행하는 것을 목표로 합니다.

#Review #3D Editing #Training-Free #Diffusion Models #Latent Space #3D Inversion #Contextual Feature Replacement #3D Consistency #Edit3D-Bench

2025년 8월 27일

[논문리뷰] Next Visual Granularity Generation

본 논문은 기존 이미지 생성 모델들이 이미지를 평면적이거나 비구조적인 데이터로 취급하여 미세한 제어 및 오류 누적에 한계가 있다는 문제점을 해결하고자 합니다.

#Review #Image Generation #Granularity Control #Structured Representation #Hierarchical Generation #Coarse-to-fine #Visual Tokenization #Latent Space

2025년 8월 19일

[논문리뷰] Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

Multimodal Large Language Models (MLLMs)가 복잡한 시각적 계획과 상상력을 요구하는 시나리오에서 겪는 어려움을 해결하고, MLLM에 내부 시각적 스크래치패드(visual scratchpad) 를 부여하여 시각적 사고(visual thought) 를 통해 멀티모달 추론 능력을 향상시키는 것을 목표로 합니다.

#Review #Multimodal LLMs #Visual Reasoning #Latent Space #Sketch Generation #Visual Thinking #Autoregressive Generation #Interpretability

2025년 10월 29일

[논문리뷰] CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

본 논문은 모방 학습(IL)에만 의존하는 자율주행 모델이 겪는 일반화 성능 저하 및 롱테일 시나리오 대응 문제 를 해결하고자 합니다. 또한, 강화 학습(RL)의 샘플 비효율성 및 불안정한 수렴 문제를 극복하기 위해, IL과 RL을 효과적으로 통합 하여 보다 견고하고 일반화된 자율주행 정책을 개발하는 것을 목표로 합니다.

#Review #Autonomous Driving #Imitation Learning #Reinforcement Learning #World Models #Latent Space #Dual-Policy #Competitive Learning

2025년 10월 16일

[논문리뷰] DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

본 논문은 기존 비디오 확산 모델의 높은 훈련 및 추론 비용 문제를 해결하여, 고해상도 및 장시간 비디오 생성의 효율성을 대폭 향상시키는 것을 목표로 합니다. 특히, 사전 훈련된 모델의 품질을 유지하면서 깊은 압축 잠재 공간 으로 효율적으로 전환하는 프레임워크를 개발하는 데 중점을 둡니다.

#Review #Video Generation #Diffusion Models #Video Autoencoder #Deep Compression #Model Acceleration #Fine-tuning #Latent Space #Temporal Modeling

2025년 10월 1일