최신 포스트

[논문리뷰] NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

기존 확산 모델이 데이터의 공간적 구조를 파괴하는 문제를 해결하고, 아키텍처 변경이나 추가 파라미터 없이 이미지의 위상을 보존하여 구조 정렬 생성(structure-aligned generation) 을 가능하게 하는 새로운 확산 프로세스를 제안합니다.

#Review #Diffusion Models #Phase Preservation #Frequency Domain #Structure-Aligned Generation #Image-to-Image Translation #Sim-to-Real #Generative AI

2025년 12월 4일

[논문리뷰] Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing

본 논문은 계산 비용이 매우 높은(각 평가에 며칠 소요) 문제인 구 채우기(sphere packing) 문제에서 AI를 활용하여 새로운 수학적 상한을 발견하는 것을 목표로 합니다. 특히, 기존의 데이터 집약적인 AI 접근 방식이 비실용적인 환경에서 샘플 효율적인 모델 기반 프레임워크 를 통해 난제를 해결하고자 합니다.

#Review #Sphere Packing #Mathematical Discovery #Semidefinite Programming (SDP)#Bayesian Optimization (BO)#Monte Carlo Tree Search (MCTS)#Sample-Efficient AI #Model-Based Learning #Geometric Constraints

2025년 12월 4일

[논문리뷰] Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

본 논문은 비디오 이해 태스크에서 멀티모달 LLM(MLLM)이 생성하는 설명문의 시각적 객체 및 시간적 행동 환각 문제를 공동으로 완화하는 것을 목표로 합니다.

#Review #Multimodal LLMs #Video Understanding #Hallucination Mitigation #Object Hallucination #Action Hallucination #Contrastive Learning #Self-Augmentation #Tracklet-Phrase Alignment

2025년 12월 4일

[논문리뷰] Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

이 논문은 instruct LLM을 비용이 많이 드는 특화된 레이블링된 데이터 없이 비레이블링된 타겟 언어 데이터만으로 새로운 언어에 적응 시킬 때 발생하는 재앙적 망각(catastrophic forgetting) 문제를 해결하는 것을 목표로 합니다.

#Review #Large Language Models (LLMs)#Catastrophic Forgetting #Language Adaptation #Continual Pre-training #Parameter Freezing #Low-Resource Languages #Source Knowledge Preservation

2025년 12월 4일

[논문리뷰] Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

본 논문은 기존 확산 모델 기반 비디오 생성 방법론의 순차적 계산 및 장기 불일치 문제를 해결하여, 실시간 스트리밍 환경에서 140억 개 파라미터 규모의 확산 모델을 사용하여 무한 길이 의 고품질 오디오 기반 아바타 생성을 가능하게 하는 것을 목표로 합니다.

#Review #Audio-Driven Avatar Generation #Real-time Streaming #Diffusion Models #Infinite Length #Pipeline Parallelism #Temporal Consistency #Model Distillation

2025년 12월 4일

[논문리뷰] LATTICE: Democratize High-Fidelity 3D Generation at Scale

본 논문은 고품질 3D 에셋 생성에 있어 3D 및 2D 생성 모델 간의 품질과 확장성 격차를 해소하는 것을 목표로 합니다. 특히, 3D 생성 과정의 높은 계산 복잡성과 효율적인 에셋 인코딩 방식 부재로 인해 발생하는 한계를 극복하고, 모델 확장성 및 성능 향상을 위한 효과적인 3D 표현을 정의하고자 합니다.

#Review #3D Generation #High-Fidelity #Latent Representation #Voxel Grid #Diffusion Models #Transformer #Scalable AI #Asset Creation

2025년 12월 4일

[논문리뷰] Generative Neural Video Compression via Video Diffusion Prior

본 논문은 기존 비디오 압축 방식이 초저비트레이트 환경에서 발생하는 흐릿함, 세부 정보 손실, 그리고 지각적 깜빡임(perceptual flickering) 문제를 해결하는 것을 목표로 합니다.

#Review #Neural Video Compression #Diffusion Models #Generative Models #Video Compression #Temporal Coherence #Perceptual Quality #Flow Matching #Video Diffusion Transformer (VideoDiT)

2025년 12월 4일

[논문리뷰] GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces

본 논문은 기존 text-to-3D 스타일 변환 방법의 느린 최적화 시간과 멀티뷰 불일치 문제를 해결하여, 3D Gaussian Splatting (3DGS) 자산에 대한 즉각적이고 고품질의 기하학적 구조 보존 및 멀티뷰 일관성을 갖춘 스타일 변환 을 목표로 합니다.

#Review #3D Gaussian Splatting #Text-to-3D Stylization #Latent Diffusion Models #Disentangled Latent Spaces #Feed-forward Editing #Geometry Preservation #Multi-view Consistency

2025년 12월 4일

[논문리뷰] FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring

본 논문은 실제 환경에서 발생하는 동적으로 변화하는 노출 과 모션에 의한 복합적인 비디오 열화 문제를 해결하여, 고해상도(HR) 및 선명한 비디오를 복원하는 것을 목표로 합니다. 기존 비디오 복원 방법론들이 고정된 노출 시간을 가정하여 실제 시나리오에 취약하다는 한계를 극복하고자 합니다.

#Review #Video Super-Resolution #Video Deblurring #Joint Restoration #Exposure-Aware #Motion Compensation #Transformer Architecture #Dynamic Filtering #Real-World Degradations

2025년 12월 4일

[논문리뷰] EgoLCD: Egocentric Video Generation with Long Context Diffusion

논문은 장기적으로 일관된 1인칭 시점(egocentric) 비디오를 생성하는 데 있어 콘텐츠 드리프트(content drift) 와 계산 자원 제약으로 인한 장기 기억(long-term memory) 관리의 어려움 을 해결하고자 합니다.

#Review #Egocentric Video Generation #Long-Context Diffusion #Long-Short Memory #Sparse KV Cache #Memory Regulation Loss #Structured Narrative Prompting #World Models #Embodied AI

2025년 12월 4일

[논문리뷰] DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

기존 4D 데이터셋이 다양성, 물리적 스케일, 다중 모달리티 주석 측면에서 제한적이어서 파운데이션 모델이 단일 카메라 비디오에서 실세계 동역학을 정확하게 해석하는 데 한계가 있었습니다.

#Review #4D World Modeling #Multimodal Data #Dynamic Scenes #Metric-Scale #Bundle Adjustment #Foundation Models #Video Analysis #Data Curation

2025년 12월 4일

[논문리뷰] DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

본 논문은 기존 MLLM 기반 텍스트-투-이미지(T2I) 생성 모델의 두 가지 주요 한계점, 즉 텍스트 기반 계획의 추상성과 희귀 속성 조합 생성의 어려움을 해결하는 것을 목표로 합니다.

#Review #Text-to-Image Generation #Chain-of-Thought (CoT)#Multimodal Large Language Models (MLLMs)#Visual Planning #Rare Concept Generation #Drafting #Classifier-Free Guidance (CFG)#Image Refinement

2025년 12월 4일

[논문리뷰] DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

본 논문은 기존 벤치마크가 놓치고 있는 실제 기업 데이터 인텔리전스 워크플로우의 복잡성 을 반영하여, 데이터 에이전트의 포괄적인 성능을 평가 하는 DAComp 벤치마크를 제시합니다.

#Review #Data Agents #Benchmarking #Data Engineering #Data Analysis #LLM-as-Judge #Full Data Intelligence Lifecycle #Repository-Level #Open-Ended Tasks

2025년 12월 4일

[논문리뷰] BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

본 논문은 기존 비디오 확산 모델의 고질적인 문제점인 장면 역학과 카메라 모션 간의 결합을 해소하고, 시간과 카메라 포즈를 명시적으로 분리하여 제어 하는 4D-controllable 비디오 생성 프레임워크 를 개발하는 것을 목표로 합니다.

#Review #Video Generation #Diffusion Models #4D Control #Camera Pose Control #Time Control #Positional Encoding #Adaptive Normalization #Synthetic Dataset

2025년 12월 4일

[논문리뷰] Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

본 연구는 최근 LVLM(Large Vision-Language Model) 기반 텍스트-투-이미지(T2I) 모델 이 이미지 생성에서 높은 품질을 달성했음에도 불구하고, 사회적 편향을 얼마나 증폭시키는지에 대한 이해가 부족하다는 문제의식을 제기합니다.

#Review #Text-to-Image #LVLM #Social Bias #System Prompts #Bias Mitigation #Meta-Prompting #Fairness #Generative AI

2025년 12월 4일

[논문리뷰] ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

본 논문은 기존 멀티모달 보상 모델(Reward Models, RMs)이 겪는 환각, 약한 시각적 접지(visual grounding), 그리고 검증을 위한 도구 사용 능력 부족 문제를 해결하는 것을 목표로 합니다.

#Review #Multimodal Reward Models #Agentic AI #Tool Use #Reinforcement Learning #Visual Reasoning #Multimodal LLMs #Instruction Following #Evaluation Benchmarks

2025년 12월 4일

[논문리뷰] 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

기존 4D 시맨틱 필드 구축 방식이 Gaussian Splatting 에 의존하여 장면별 최적화가 필요하고 일반화 및 확장성이 제한적인 문제를 해결하고자 합니다.

#Review #4D Scene Understanding #Language Grounding #Transformer #Feed-forward Network #Semantic Field #Geometry Reconstruction #Embodied AI

2025년 12월 4일

[Triton] Hopper에서 소규모 배치 크기 벤치마크 수정

Hopper GPU에서 small batch MLP 벤치마크의 num_warps 설정과 테스트 케이스 추가

#Triton #Benchmark #Hopper #MLP #Bug Fix

2025년 12월 4일

[Triton] SwiGLU exp2 최적화 부분 롤백 — 수치 정확도 우선

exp2_ftz 최적화가 일부 모델에서 수치 차이를 유발하여 일시 롤백

#Triton #Kernel #Numerical Stability #Revert #SwiGLU

2025년 12월 4일

[논문리뷰] ViDiC: Video Difference Captioning

본 논문은 동적 비디오 시퀀스 간의 시각적 차이를 이해하고 설명하는 Video Difference Captioning (ViDiC) 이라는 새로운 태스크를 제안합니다.

#Review #Video Difference Captioning #Multimodal Large Language Models #Video Understanding #Comparative Reasoning #Evaluation Benchmark #LLM-as-a-Judge #ViDiC-1K

2025년 12월 3일