최신 포스트

[논문리뷰] ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation

Text-driven Video Generation 모델들은 영화 제작의 민주화를 이끌었지만, Cinematic Multi-Shot Scenario에서의 Camera Control은 여전히 중요한 병목(Bottleneck)으로 남아 있습니다.

2026년 3월 12일

[논문리뷰] One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers

기존 Diffusion Transformers (DiTs) 는 높은 생성 품질을 달성하지만, 컴퓨팅 비용이 입력 이미지 해상도에 고정되어 Latency-Quality Trade-off가 경직되어 있습니다.

2026년 3월 12일

[논문리뷰] OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

현대 visual agent는 로봇, AR 장치 등 실시간 스트리밍 환경에서 작동하기 위해 일반적이고, 인과적이며, 물리적으로 구조화된 표현을 요구합니다.

#Review #streaming visual backbone #causal spatiotemporal attention #3D-ROPE #multi-task learning #real-time inference #embodied agents #vision-language alignment

2026년 3월 12일

[논문리뷰] Mobile-GS: Real-time Gaussian Splatting for Mobile Devices

3D Gaussian Splatting (3DGS)은 고품질 novel view synthesis 를 위한 강력한 기법으로 부상했지만, 높은 computational demands 와 막대한 storage costs 로 인해 mobile devices 에 배포하여 real-time rendering 을 구현하는 데 상당한 어려움이 있습니다.

#Review #Gaussian Splatting #Mobile Rendering #Order-Independent Transparency #Neural Quantization #Real-time Rendering #View-dependent Enhancement #Spherical Harmonics Distillation #Resource-constrained Devices

2026년 3월 12일

[논문리뷰] IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Long-context agentic workflows에서 Large Language Models (LLMs)의 attention efficiency는 inference speed와 serving cost에 결정적인 요소입니다.

2026년 3월 12일

[논문리뷰] GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Unified Multimodal Models (UMMs)는 지식, 구조화된 추론, 제어 가능한 생성을 단일 시스템으로 통합하는 것을 목표로 하지만, 현재 이미지 편집 벤치마크 [37, 57]는 주로 natural image domain과 shallow commonsense reasoning에 국한되어 있습니다.

2026년 3월 12일

[논문리뷰] Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

최근 Reinforcement Learning from Verifiable Rewards (RLVR) 는 추론(reasoning) 기반 LLM의 성능 향상에 큰 효과를 보였으나, 출력의 정확성을 직접 확인할 수 없는 non-verifiable domains 에는 적용하기 어렵다는 한계가 있습니다.

2026년 3월 12일

[논문리뷰] EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models

최근 MLLMs 가 확산 프레임워크에 텍스트 인코더로 널리 통합되어 공간 추론과 같은 복잡한 태스크를 해결하고 있지만, 이 패러다임에는 두 가지 주요 한계가 있습니다. 첫째, MLLMs text encoder 는 불충분한 추론 깊이를 보입니다.

2026년 3월 12일

[논문리뷰] EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

AR Video Generative Model은 Video Tokenizer를 통해 픽셀을 discrete visual token sequence로 압축하며, 이 token sequence의 길이가 Reconstruction Quality와 Downstream Generation의 Computational Cost 간의 균형에 critical하다.

2026년 3월 12일

[논문리뷰] DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning

대규모 diffusion models 가 비디오 합성 능력을 혁신했지만, multi-subject identity 와 multi-granularity motion 에 대한 정밀한 제어는 여전히 중대한 과제로 남아있습니다.

#Review #Video Diffusion Models #Video Customization #Motion Control #Reinforcement Learning #Multi-Subject #Omni-Motion #Latent Identity #DiT

2026년 3월 12일

[논문리뷰] DVD: Deterministic Video Depth Estimation with Generative Priors

기존 비디오 Depth Estimation 방법론은 근본적인 Trade-off에 직면해 있습니다.

#Review #Video Depth Estimation #Generative Priors #Deterministic Adaptation #Diffusion Models #Latent Manifold Rectification #Global Affine Coherence #Zero-shot Learning #Temporal Consistency

2026년 3월 12일

[논문리뷰] DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

최근 LLM 기반 에이전트의 도구 사용 능력 향상을 위한 Agentic Task Synthesis 연구가 활발합니다. 그러나 기존 접근 방식은 합성된 작업의 Insufficient Diversity 로 인해 작업 및 도구 세트 변화에 대한 Robust Generalization 능력이 부족하다는 한계를 가집니다.

#Review #Agentic Task Synthesis #Diversity Scaling #Tool Use #Generalization #Reinforcement Learning #Supervised Fine-tuning

2026년 3월 12일

[논문리뷰] Coarse-Guided Visual Generation via Weighted h-Transform Sampling

Coarse-Guided Visual Generation 은 deblurring, super-resolution 등 다양한 실제 애플리케이션에 필수적입니다.

#Review #Guided Visual Generation #Diffusion Model #Doob's h-Transform #Coarse-guided Generation #Training-free #Image Restoration #Video Generation #Weighted Sampling

2026년 3월 12일

[논문리뷰] Automatic Generation of High-Performance RL Environments

일반적인 Reinforcement Learning (RL) 훈련에서 환경 시뮬레이션은 전체 Wall-Clock Time의 50-90% 를 소비하며, 이는 학습 Process의 주요 Bottleneck으로 작용합니다.

2026년 3월 12일

[논문리뷰] Are Video Reasoning Models Ready to Go Outside?

실제 환경에 배포된 Vision-Language Models (VLMs)는 날씨, 폐색, 카메라 움직임과 같은 방해 요소를 자주 마주칩니다.

2026년 3월 12일

[논문리뷰] Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data

대부분의 영어 사용자가 비원어민(L2) 화자 임에도 불구하고, 현재의 Text-To-Speech (TTS) 시스템은 악센트 데이터 부족으로 인해 주로 미국식 영어 악센트(American-accented English) 를 모델링합니다.

#Review #Text-To-Speech #Controllable Speech Synthesis #Accented Speech Generation #Accent Vector #Multilingual TTS #LoRA

2026년 3월 12일

[Triton] Proton CuptiProfiler 다양한 버그 수정 및 개선

2026년 3월 13일

[triton] AMD GFX1250 MXFP Flash Attention 예제 커널 대규모 리팩터링

preshuffle 로직 제거, TDM store 도입, expand_dims 전환 등 GFX1250 FA 예제를 단순화하고 성능을 개선한 리팩터링을 분석합니다.

#Triton #AMD #GPU #FlashAttention #GFX1250 #Refactoring

2026년 3월 12일

[Ray] Ray Data에 cuDF 배치 포맷 추가

Ray Data의 batch_format에 cudf.DataFrame을 추가해 GPU 네이티브 데이터 처리 파이프라인 지원

#Ray #GPU #cuDF #Data Processing

2026년 3월 12일

[pytorch] Inductor: bf16/fp16에서 addmm unfuse를 방지하여 정밀도 손실 해결

PyTorch Inductor의 pattern matcher에서 half precision addmm의 unfuse를 방지하여, 딥 모델에서 누적되는 truncation 에러를 차단한 버그 수정을 분석합니다.

#PyTorch #Inductor #Precision #bf16 #fp16 #Pattern Matching #Compiler

2026년 3월 11일