최신 포스트

[논문리뷰] PEARL: Personalized Streaming Video Understanding Model

인간의 새로운 개념 인지 과정은 본질적으로 스트리밍(streaming) 프로세스입니다. 우리는 끊임없이 새로운 객체나 신원을 인식하고 시간이 지남에 따라 기억을 업데이트합니다. 그러나 현재의 멀티모달(multimodal) 개인화 방법론들은 주로 정적인 이미지나 오프라인(offline) 비디오에 국한되어 있습니다.

#Review #Personalized Streaming Video Understanding #PSVU #PEARL-Bench #Dual-grained Memory System #Concept-aware Retrieval Algorithm #Vision-Language Models #Real-time AI Assistants

2026년 3월 24일

[논문리뷰] MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation

최근 multi-reference image generation 시스템은 하나의 이미지 내에서 여러 entity를 세밀하게 제어하는 기능에 대한 기대를 높이고 있다.

#Review #Multi-subject Generation #Attribute Misbinding #Image Generation #Benchmark #Evaluation Protocol #Deep Learning #Computer Vision

2026년 3월 24일

[논문리뷰] MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

최근 Vision-Language Models (VLMs)의 발전에도 불구하고, 대부분의 기존 문서 OCR 시스템들은 autoregressive (AR) decoding 방식에 의존하고 있습니다.

#Review #Document OCR #Diffusion Models #Inverse Rendering #Parallel Decoding #Block-Attention #Curriculum Learning #Vision-Language Models

2026년 3월 24일

[논문리뷰] From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

LLM 기반 시스템은 단일 프롬프트에 응답하는 단순한 챗봇을 넘어, LLM 호출, 정보 검색, 툴 사용, 코드 실행, 메모리 업데이트 및 검증을 통합하는 실행 가능한 워크플로를 구성하여 태스크를 해결하는 방식으로 발전하고 있습니다.

#Review #LLM Agents #Workflow Optimization #Agentic Computation Graphs (ACGs)#Static Optimization #Dynamic Optimization #Runtime Adaptation #Evaluation Protocol #Feedback Signals

2026년 3월 24일

[논문리뷰] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Multimodal AI agents는 online web execution을 포함하는 복잡한 real-world workflow를 점차 자동화하고 있습니다.

#Review #Multimodal AI Agents #Web-agent Benchmark #Egocentric Video #Visual Grounding #Online Evaluation #LLM-as-a-Judge #Perception-Action Alignment

2026년 3월 24일

[논문리뷰] DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

고품질 데이터로 훈련된 Optical Flow 모델들은 블러(blur), 노이즈(noise), 압축 아티팩트(compression artifacts)와 같은 실제 환경의 손상(real-world corruptions)에 직면할 때 성능이 심각하게 저하됩니다.

#Review #Optical Flow Estimation #Diffusion Models #Degradation-Aware #Image Restoration #Dense Correspondence #Spatio-Temporal Attention #Hybrid Architecture

2026년 3월 24일

[논문리뷰] Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

최근 Multi-modal Large Language Models (MLLMs)의 발전은 일반 목적의 비디오 이해 분야에서 상당한 진전을 가져왔습니다. 그러나 이러한 모델들은 long-form, high-resolution 비디오를 처리하는 데 심각한 어려움을 겪고 있습니다.

#Review #Video Understanding #Multi-modal Large Language Models (MLLMs)#Vision Transformers (ViTs)#Autoregressive Gazing #Token Reduction #Multi-scale Patches #High-Resolution Video #Long-Form Video

2026년 3월 24일

[논문리뷰] 2Xplat: Two Experts Are Better Than One Generalist

기존의 3D Gaussian Splatting (3DGS) 파이프라인은 Scene당 수십 분에서 수 시간까지 소요되는 계산 집약적인 Iterative Optimization 절차에 의존하여 광범위한 적용에 한계가 있었습니다.

#Review #3D Gaussian Splatting (3DGS)#Pose-free #Feed-forward #Two-Experts Architecture #Geometry Estimation #Appearance Modeling #Novel View Synthesis #Training Efficiency

2026년 3월 24일

[Triton] matmul 커널 시그니처에 input microblock size 추가

2026년 3월 25일

[Triton] GFX1250 matmul 커널에 Scale Swizzling 통합

2026년 3월 25일

[Ray] 압력 기반 메모리 모니터 도입으로 메모리 관리 고도화

cgroup PSI 기반 메모리 압력 감지로 임계값 방식보다 정밀한 메모리 관리

#Ray #Performance

2026년 3월 24일

[cpython] CPython JIT 최적화: Float 연산의 In-place 변환을 통한 성능 향상

CPython JIT의 Tier 2 옵티마이저에서 고유 참조된 Float 피연산자를 In-place로 변환하여 메모리 할당을 줄이고 성능을 개선하는 방법.

#CPython #JIT #Optimization #Python Internals #Performance

2026년 3월 24일

[Open WebUI] chatEventHandler의 히스토리 업데이트를 rAF로 배치 처리하기

스트리밍 중 불필요한 Svelte 반응형 업데이트를 requestAnimationFrame으로 묶어 성능 개선

#Open WebUI #Performance

2026년 3월 24일

[Ray Serve] SGLang 서버의 순차 배치 처리를 동시 실행으로 전환

completions 엔드포인트에서 여러 프롬프트를 for 루프로 순차 처리하던 로직을 SGLang의 네이티브 배치 호출로 변경하여 동시 처리 성능을 개선한 수정.

#Ray #Python #Performance #SGLang #LLM Serving

2026년 3월 24일

[triton] GSan 테스트에서 nanosleep 대신 Atomic 기반 동기화로 전환

GPU Sanitizer 테스트에서 비결정적인 nanosleep 기반 동기화를 atomic polling으로 교체하여 테스트 안정성을 크게 향상시킨 사례를 분석합니다.

#Triton #GSan #Testing #GPU #Synchronization

2026년 3월 24일

[vllm] Thinking Token Hard Limit - 추론 토큰 수 제한으로 리소스 제어

reasoning 모델의 thinking token에 hard limit을 설정하여 과도한 연산 소비를 방지하고 예측 가능한 서빙

#vllm #Performance

2026년 3월 24일

[Gradio] 백엔드 프로파일링 및 벤치마크 인프라 구축

서버 요청 처리 단계별 타이밍을 추적하는 profiling 모듈과 벤치마크 스크립트를 추가한다

#Gradio #Profiling #Benchmark #Observability

2026년 3월 24일

[CPython] JIT float 연산 최적화 — 유일 참조 피연산자 재사용

CPython JIT에서 유일 참조 float 객체를 in-place 변경하여 메모리 할당을 제거한다

#CPython #JIT #Optimization #Float

2026년 3월 24일

[Open WebUI] asyncio.to_thread로 heartbeat DB 쓰기 이벤트 루프 블로킹 해소

heartbeat 핸들러에서 동기 DB 호출이 이벤트 루프를 블로킹하는 문제를 asyncio.to_thread로 해결한 1줄 수정 PR 분석.

#Open WebUI #asyncio #Python #Event Loop #Database #WebSocket

2026년 3월 24일

[논문리뷰] WorldCache: Content-Aware Caching for Accelerated Video World Models

Diffusion Transformers (DiTs) 기반의 비디오 World Model은 물리적으로 일관된 미래 visual state를 예측하는 데 필수적이지만, 순차적인 denoising 과정과 높은 계산 비용의 spatio-temporal attention으로 인해 상당한 계산 비용이 발생합니다.

#Review #Diffusion Transformers #Video World Models #Feature Caching #Inference Acceleration #Content-Aware Caching #Motion-Adaptive Caching #Perception-Constrained Caching #Optimal Feature Approximation

2026년 3월 23일