최신 포스트

[논문리뷰] EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

기존의 Video Object Removal 방법론들은 주로 입력 마스크에 의존하여 객체를 제거하며, 이로 인해 객체가 유발하는 그림자(shadow), 반사(reflection), 변형(deformation)과 같은 복잡한 시각적 부수 효과(side effects)를 제대로 처리하지 못하는 한계가 있습니다 [cite: 1, Figure 2].

#Review #Video Object Removal #Video Object Insertion #Diffusion Models #Effect Erasing #Reciprocal Learning #Deep Learning #Computer Vision

2026년 3월 19일

[논문리뷰] Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Unified multimodal modeling을 위한 시도는 language models과 마찬가지로 visual models도 semantically meaningful tokens으로 동작해야 한다는 요구사항을 제기한다.

2026년 3월 19일

[논문리뷰] Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

Multimodal Large Language Models (MLLMs)는 자연스러운 장면 해석에서 놀라운 성공을 거두었지만, 인간 인지의 기본 구성 요소인 Discrete Symbols 처리 능력은 여전히 중요한 미해결 과제로 남아 있습니다.

#Review #Multimodal Large Language Models (MLLMs)#Discrete Symbols #Cognitive Mismatch #Symbol Understanding #Benchmark #Recognition-Reasoning Inversion #Human Cognition

2026년 3월 19일

[논문리뷰] Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

기존의 Motion Generation 연구는 주로 Kinematic Control에 강점을 보이는 Continuous Diffusion Models 또는 Semantic Conditioning에 효과적인 Discrete Token-based Generators의 두 가지 패러다임을 따랐습니다.

#Review #Motion Generation #Diffusion Models #Discrete Tokens #Kinematic Control #Semantic Conditioning #Motion Tokenizer #Perception-Planning-Control

2026년 3월 19일

[논문리뷰] 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model

Immersive VR/AR, virtual production, next-generation e-commerce 등 다양한 분야에서 customized subject의 dynamic하고 view-consistent한 비디오 생성에 대한 수요가 증가하고 있습니다.

#Review #3D-aware video generation #subject-driven customization #multi-view conditioning #video diffusion models #LoRA #temporal dynamics #3Dapter #3DreamBooth

2026년 3월 19일

[Triton] GFX1250용 stream-k 및 attention decode 커널 업데이트

2026년 3월 20일

[triton] Custom DSL Plugin Ops 지원

Triton 플러그인 시스템에 custom op 등록 기능을 추가하여, 서드파티가 자체 DSL 연산을 Triton 프론트엔드에 통합할 수 있도록 한 PR을 분석합니다.

#Triton #Plugin System #DSL #Extensibility #Frontend

2026년 3월 19일

[triton] getTranspositionSelectors 알고리즘 단순화 및 복원

다중 mixed transposition에서의 정합성 문제를 해결하고, prmt selector 알고리즘의 수학적 분해를 명확히 정리한 사례를 분석합니다.

#Triton #GPU #LinearLayout #Optimization #Algorithm

2026년 3월 19일

[triton] ConSan Multi-CTA 지원 추가

Triton의 Concurrency Sanitizer(ConSan)에 multi-CTA 클러스터 환경 지원을 추가하여, 클러스터 내 여러 CTA가 공유하는 scratch memory 상태를 올바르게 추적하도록 개선한 PR을 분석합니다.

#Triton #GPU Compiler #Concurrency Sanitizer #Multi-CTA #CUDA

2026년 3월 19일

[axolotl] Axolotl: Triton 커널을 활용한 Entropy 및 Selective Log Softmax 최적화

Axolotl에서 Triton 커널을 사용하여 Entropy 및 Selective Log Softmax 계산을 최적화하여 훈련 성능을 크게 향상시킨 PR 분석.

#Triton #PyTorch #Optimization #Deep Learning #Performance #GPU

2026년 3월 19일

[axolotl] Triton LoRA 커널 Autotune 테스트 안정화: pytest-xdist 환경에서의 모듈 격리 전략

pytest-xdist 병렬 실행 시 sys.modules 공유로 인한 flaky 테스트를 _find_lora_ops_module 직접 패치 방식으로 해결한 사례를 분석합니다.

#Axolotl #Triton #Testing #pytest #LoRA

2026년 3월 19일

[axolotl] Axolotl 커스텀 Triton 커널 — entropy/softmax 최대 5배 가속

Triton 커널로 entropy_from_logits와 selective_log_softmax를 fuse하여 RLHF 학습을 가속한다

#Triton #RLHF #Kernel Optimization #Axolotl

2026년 3월 19일

[Ray] find_gcs_addresses 결과 캐싱으로 프로세스 스캔 비용 제거

매번 프로세스 목록을 스캔하던 GCS 주소 탐색을 캐싱하여 성능 개선

#Ray #Performance

2026년 3월 18일

[Loki] Shard Factor 1일 때 Shuffle Shard 생략으로 메모리 50% 절감

단일 파티션 할당 시 불필요한 ShuffleShard 호출을 건너뛰어 CPU와 메모리 사용량 대폭 절감.

#Grafana Loki #Go #Performance #Memory Optimization #Kafka

2026년 3월 18일

[논문리뷰] When AI Navigates the Fog of War

기존 Large Language Models (LLMs)의 geopolitical forecasting 연구들은 data leakage 문제로 인해 true out-of-distribution reasoning 능력을 정확히 평가하기 어렵다는 한계가 있었습니다.

2026년 3월 18일

[논문리뷰] VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

장편 비디오(long-form video)를 이해하는 것은 방대한 시간적 공간 내에서 희소하고 태스크 관련 증거(task-relevant evidence)를 찾아내는 어려운 문제입니다. 기존의 비디오-언어 모델(VLM) 접근 방식은 두 가지 주요 도전 과제에 직면합니다.

2026년 3월 18일

[논문리뷰] Video-CoE: Reinforcing Video Event Prediction via Chain of Events

비디오 태스크에 대한 MLLM 애플리케이션의 발전에도 불구하고, VEP 는 상대적으로 미개척 상태로 남아있습니다.

#Review #Video Event Prediction (VEP)#Multimodal Large Language Models (MLLMs)#Chain of Events (CoE)#Logical Reasoning #Visual Grounding #Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)

2026년 3월 18일

[논문리뷰] Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

Video VLM은 방대한 수의 프레임을 인코딩하고, 각 프레임이 Vision Transformer (ViT)에 의해 수백 개의 Patch Token으로 분해되면서 막대한 계산 비용을 발생시킵니다.

#Review #Token Pruning #Video-Language Models (VLMs)#Computational Efficiency #Spatio-Temporal Scoring #Vision Transformers (ViT)#Large Language Models (LLM)#End-to-End Training

2026년 3월 18일

[논문리뷰] Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models

최근 MLLMs 는 비디오-기반 Supervised Fine-tuning (Video-SFT) 을 통해 시각적 이해 능력을 크게 발전시켜왔습니다. 그러나 Video-SFT 가 시각적 능력의 미세한 진화, 특히 공간적 이해와 시간적 이해 사이의 균형에 미치는 영향은 아직 제대로 연구되지 않았습니다.

#Review #Multimodal Large Language Models (MLLMs)#Video-SFT #Temporal Trap #Spatial Understanding #Temporal Budget #Hybrid-Frame Strategy #Negative Transfer

2026년 3월 18일

[논문리뷰] Stereo World Model: Camera-Guided Stereo Video Generation

기존의 generative world model은 주로 monocular video representation을 사용하며, 이는 implicit depth, ambiguous scale, 그리고 long-horizon camera trajectory에서 누적되는 3D error와 같은 근본적인 기하학적 한계를 가집니다.

2026년 3월 18일