Review

[논문리뷰] Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets

본 논문은 비디오 기반 데이터셋에서 발생하는 정보 누출(information leakage) 문제를 해결하는 것을 목표로 합니다.

#Review #Data Leakage #Video Datasets #Clustering #Frame Selection #Deep Learning #Object Detection #Dataset Partitioning #Dimensionality Reduction

2025년 11월 30일

[논문리뷰] FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

논문은 기존 FL 방법론이 가정하는 모델 동질성(homogeneous model architectures) 의 비현실성을 지적하며, 모델 이질성(model-heterogeneous FL) 환경에서 성능, 프라이버시, 통신 오버헤드 간의 효과적인 균형을 달성하는 것을 목표로 합니다.

#Review #Federated Learning #Model Heterogeneity #Representation Learning #Privacy Preservation #Communication Efficiency #Entangled Representation #Knowledge Transfer

2025년 11월 30일

[논문리뷰] Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

본 논문은 3D Diffusion 모델의 느린 추론 속도 문제를 해결하는 것을 목표로 합니다.

#Review #3D Geometry Synthesis #Diffusion Models #Acceleration #Caching #Training-free #Flow Matching #Voxel Stabilization #Computational Efficiency

2025년 11월 30일

[논문리뷰] Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

본 연구는 대규모 언어 모델(LLM)이 초장문 컨텍스트(ultra-long context) 를 효율적으로 처리하여 '기억하는 기계'를 구축하는 과제를 해결하고자 합니다.

#Review #Large Language Models #Long Context #Sparse Attention #Hierarchical Sparse Attention (HSA)#Length Generalization #Mixture of Experts (MoE)#Transformer

2025년 11월 30일

[논문리뷰] DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

본 논문은 Vision-Language-Action (VLA) 모델에서 발생하는 '액션 퇴화(action degeneration)' 문제를 해결하는 것을 목표로 합니다.

#Review #Vision-Language-Action (VLA)#Embodied AI #Action Degeneration #Data Pruning #Knowledge Distillation #Multi-modal Reasoning #Robot Learning #VLA Score

2025년 11월 30일

[논문리뷰] DiP: Taming Diffusion Models in Pixel Space

본 연구는 확산 모델(Diffusion Models)의 근본적인 문제인 생성 품질과 계산 효율성 간의 절충점 을 해결하는 것을 목표로 합니다.

#Review #Diffusion Models #Pixel Space #Latent Diffusion Models (LDMs)#Diffusion Transformer (DiT)#Patch Detailer Head #Global-Local Modeling #Computational Efficiency #ImageNet

2025년 11월 30일

[논문리뷰] DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

대규모 언어 모델(LLM)이 수학적 추론에서 최종 정답 기반 보상의 한계를 가지며, 이는 증명 작업에 적용하기 어렵고 추론의 정확성을 보장하지 못한다는 문제점을 해결하고자 합니다.

#Review #Mathematical Reasoning #Large Language Models (LLMs)#Proof Verification #Self-Verification #Reinforcement Learning (RL)#Theorem Proving #Meta-Verification #Iterative Refinement

2025년 11월 30일

[논문리뷰] Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

본 논문은 Distribution Matching Distillation (DMD) 의 성공에 대한 기존의 이해에 도전하며, 복잡한 텍스트-투-이미지 생성 작업에서 CFG(Classifier-Free Guidance)가 필수적인 이유를 밝히고자 합니다.

#Review #Diffusion Models #Model Distillation #Classifier-Free Guidance (CFG)#Distribution Matching #Text-to-Image Generation #Few-step Generation #Regularization #Score-based Models

2025년 11월 30일

[논문리뷰] CaptionQA: Is Your Caption as Useful as the Image Itself?

본 논문은 기존 MLLM 평가 방식이 캡션의 실제 활용성, 즉 다운스트림 태스크에서 이미지를 대체할 수 있는 능력 을 간과한다고 지적합니다.

#Review #Image Captioning #Caption Evaluation #Multimodal LLM #Utility-based Benchmark #Question Answering (QA)#Domain-specific Taxonomy #Hallucination #MLLM Evaluation

2025년 11월 30일

[논문리뷰] Captain Safari: A World Engine

본 논문은 기존 비디오 세계 모델들이 겪는 장기적인 3D 일관성 부족, 공격적인 6-DoF 카메라 궤적 추적의 어려움, 복잡한 야외 환경 표현의 한계를 극복하는 것을 목표로 합니다.

#Review #World Engine #3D Consistent Video Generation #Pose-conditioned Memory #Camera Control #FPV Video Synthesis #Diffusion Models #Drone Video Dataset

2025년 11월 30일

[논문리뷰] Architecture Decoupling Is Not All You Need For Unified Multimodal Model

본 논문은 통합 멀티모달 모델(UMM)에서 시각 생성 및 이해 태스크 간의 내재된 충돌을 완화하면서도 모델 아키텍처 디커플링에 과도하게 의존하지 않고 성능을 향상시키는 것을 목표로 합니다. 과도한 디커플링이 통합 모델의 상호작용적 추론 능력과 지식 전이 능력을 저해하는 문제를 해결하고자 합니다.

#Review #Unified Multimodal Models #Architecture Decoupling #Cross-Modal Attention #Attention Interaction Alignment (AIA) Loss #Task Conflicts #Image Generation #Image Understanding

2025년 11월 30일

[논문리뷰] AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

본 논문은 다양한 다중 인물 데이터 수집의 높은 비용과 여러 인물을 일관된 상호작용으로 구동하기 어려운 문제를 해결하고자 합니다. 특히, 적은 양의 다중 인물 데이터로도 자연스러운 제스처, 생생한 감정, 상호작용이 풍부한 다중 인물 대화 영상을 확장 가능하게 생성하는 것을 목표로 합니다.

#Review #Multi-Person Video Generation #Audio-Driven Animation #Diffusion Models #Interactivity Refinement #Identity-Aware Attention #Scalability #Data Efficiency

2025년 11월 30일

[논문리뷰] Adversarial Flow Models

본 논문은 기존 GANs (Generative Adversarial Networks) 의 훈련 불안정성과 Flow Matching 모델의 저해상도 이산화 오류 및 반복적인 추론 비용 문제를 해결하고자 합니다.

#Review #Generative Models #Adversarial Flow Models #GANs #Flow Matching #Optimal Transport #Single-step Generation #Image Generation #Transformer Architecture

2025년 11월 30일

[논문리뷰] What does it mean to understand language?

본 논문은 인간의 심층적인 언어 이해 가 뇌의 핵심 언어 시스템 내에서만 이루어지는 것이 아니라, 해당 시스템에서 얻은 정보가 다른 전문화된 뇌 영역으로 내보내져(exportation) 처리 되어야 한다는 가설을 제안합니다.

#Review #Language Understanding #Cognitive Neuroscience #Situation Models #World Knowledge #Embodiment #fMRI #Large Language Models #Brain Networks

2025년 11월 27일

[논문리뷰] Video Generation Models Are Good Latent Reward Models

비디오 생성 모델을 인간의 선호도에 맞춰 정렬하는 Reward Feedback Learning (ReFL) 의 기존 한계, 즉 높은 메모리 사용량, 긴 훈련 시간, 초기 생성 단계 감독 부족 문제를 해결하는 것이 목표입니다.

#Review #Video Generation #Reward Feedback Learning #Latent Space #Diffusion Models #Human Preferences #Motion Quality #Process-aware

2025년 11월 27일

[논문리뷰] Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

본 연구는 기존 멀티모달 평가 벤치마크들이 단일, 총체적 선호도 에만 초점을 맞춰, 미세한 기준별 판단 과 기준 간의 충돌 을 간과하는 한계를 해결하고자 합니다.

#Review #Multimodal Judges #LMM Evaluation #Pluralistic Criteria #Criteria-Following #Trade-off Sensitivity #Conflict Resolution #Reward Models #Benchmark

2025년 11월 27일

[논문리뷰] MIRA: Multimodal Iterative Reasoning Agent for Image Editing

이 논문은 확산 기반 이미지 편집 모델이 복잡한 사용자 지침(구성 관계, 맥락적 단서, 참조 표현 등)을 정확하게 해석하지 못하여 발생하는 의미론적 드리프트 및 편집 실패 문제를 해결하는 것을 목표로 합니다.

#Review #Image Editing #Multimodal AI #Iterative Reasoning #Agentic AI #Reinforcement Learning #Diffusion Models #Vision-Language Models #Instruction Following

2025년 11월 27일

[논문리뷰] Canvas-to-Image: Compositional Image Generation with Multimodal Controls

본 연구는 최신 확산 모델이 텍스트 프롬프트, 객체 참조, 공간 배치, 포즈 제약, 레이아웃 주석 등 다양한 유형의 제어 신호를 동시에 처리할 때 발생하는 제한적인 합성 능력과 낮은 충실도 문제를 해결하는 것을 목표로 합니다.

#Review #Image Generation #Diffusion Models #Compositional Control #Multimodal Control #Unified Canvas #Multi-Task Learning #Personalization

2025년 11월 27일

[논문리뷰] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

현재 MLLM(Multimodal Large Language Models) 이 각 문제를 de novo 방식으로 해결하며 시각적 주의 집중 및 논리적 추론 오류를 반복하는 한계를 극복하는 것이 목표입니다.

#Review #Multimodal LLMs #Semantic Memory #Agentic Learning #Error Attribution #Visual Reasoning #Long-term Memory #Grow-and-Refine #Multimodal Reasoning

2025년 11월 27일

[논문리뷰] Terminal Velocity Matching

논문은 고품질 샘플을 빠르고 효율적으로 생성하며, 고차원 데이터에 확장 가능한 생성 모델을 단일 훈련 단계로 구축하는 것을 목표로 합니다.

#Review #Generative Models #Flow Matching #Diffusion Models #One-Step Generation #Few-Step Generation #Wasserstein Distance #Transformer Architecture #Lipschitz Continuity

2025년 11월 26일