최신 포스트

[Open WebUI] KaTeX 모듈 import를 싱글턴으로 캐싱하여 렌더링 최적화

Open WebUI에서 수학 수식이 포함된 메시지마다 KaTeX를 반복 import하던 비효율을 Svelte의 context='module'을 활용한 싱글턴 패턴으로 해결한 최적화를 분석합니다.

#Open WebUI #Svelte #KaTeX #Performance #Module Cache

2026년 2월 26일

[faster-qwen3-tts] Windows 네이티브 셋업 및 벤치마크 스크립트 추가

setup_windows.bat과 benchmark_windows.bat을 추가하여 WSL 없이 Windows에서 직접 실행할 수 있도록 한다

#faster-qwen3-tts #TTS #Windows #DevEx

2026년 2월 26일

[Open WebUI] ResponseMessage에서 JSON.stringify 비교를 O(1) fast-path로 우회

스트리밍 중 매 토큰마다 발생하는 2회의 O(n) JSON.stringify 호출을 content/done 필드 비교로 우회한 분석.

#Open WebUI #JavaScript #Performance #Svelte #Streaming

2026년 2월 26일

[faster-qwen3-tts] 생성 요청 직렬화 및 모델 캐싱 도입

asyncio Lock으로 동시 생성을 방지하고, 로드된 모델을 캐싱하여 모델 전환 시 재로딩을 방지한다

#faster-qwen3-tts #TTS #Concurrency #Caching

2026년 2월 26일

[triton] Proton 커널 런처에 더 많은 메타데이터 전달

Proton의 metric 커널 런치에 numThreads와 sharedMemBytes 등 추가 메타데이터를 전달하여 GPU 자원 활용을 정밀하게 제어하도록 개선한 사례를 분석합니다.

#Triton #Proton #Profiling #GPU #KernelLaunch

2026년 2월 26일

[vllm] --performance-mode: 워크로드별 최적화 프로파일

balanced, interactivity, throughput 3가지 모드로 CUDA Graph 캡처 전략과 배칭 동작을 자동 조정

#vllm #Performance

2026년 2월 26일

[triton] Backend별 global_scratch_alloc 할당 통합

Proton 프로파일러의 scratch 메모리를 별도 풀로 분리하고, third-party allocation 지원을 추가하여 global scratch 메모리 관리를 통합한 사례를 분석합니다.

#Triton #GPU #MemoryAllocation #Proton #Refactoring

2026년 2월 26일

[논문리뷰] World Guidance: World Modeling in Condition Space for Action Generation

본 논문은 Vision-Language-Action (VLA) 모델이 효율적이고 예측 가능한 미래 표현을 유지하면서 정밀한 액션 생성을 위한 충분한 세분화된 정보를 보존하는 데 어려움을 겪는 문제를 해결합니다.

#Review #World Model #Action Generation #Vision-Language-Action Models (VLA)#Condition Space #Imitation Learning #Robotics #Generalization #Human Manipulation

2026년 2월 25일

[논문리뷰] VecGlypher: Unified Vector Glyph Generation with Language Models

기존 벡터 글리프 생성 파이프라인이 수동으로 선별된 예시 시트와 래스터-벡터 후처리 과정에 의존하여 접근성과 편집성이 제한되는 문제를 해결하고자 합니다. 자연어 설명이나 이미지 예시만으로 고품질의 편집 가능한 벡터 글리프를 직접 생성하는 단일 멀티모달 언어 모델 인 VecGlypher 를 개발하는 것이 목표입니다.

#Review #Vector Graphics #Glyph Generation #Language Models #Multimodal AI #SVG #Font Design #Text-to-Vector #Image-to-Vector

2026년 2월 25일

[논문리뷰] UniVBench: Towards Unified Evaluation for Video Foundation Models

이 논문은 비디오 파운데이션 모델(VFM)의 통합된 역량을 평가하기 위한 파편화되고 제한적인 기존 벤치마크의 한계를 해결하고자 합니다.

#Review #Video Foundation Models #Unified Evaluation #Multi-task Learning #Video Understanding #Video Generation #Video Editing #Video Reconstruction #Agentic Evaluation #Cinematic Dimensions

2026년 2월 25일

[논문리뷰] The Design Space of Tri-Modal Masked Diffusion Models

본 논문은 텍스트, 이미지-텍스트, 오디오-텍스트 데이터에 대해 처음부터 사전 훈련된 최초의 삼중 모달(tri-modal) 마스크드 확산 모델(MDM) 을 소개합니다.

#Review #Masked Diffusion Models #Multimodal AI #Scaling Laws #Discrete Diffusion #SDE Parameterization #Hyperparameter Transfer #Unified Generation

2026년 2월 25일

[논문리뷰] Solaris: Building a Multiplayer Video World Model in Minecraft

기존 단일 에이전트 비디오 월드 모델의 한계를 극복하고, Minecraft 와 같은 복잡한 3D 환경에서 일관된 다중 시점 관찰을 시뮬레이션할 수 있는 다중 에이전트 비디오 월드 모델 (Solaris) 을 구축하는 것이 목표입니다.

#Review #Multi-agent World Models #Video Diffusion Models #Minecraft #Self Forcing #Checkpointed Self Forcing #Multi-view Consistency #Data Collection #Embodied AI

2026년 2월 25일

[논문리뷰] SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

이 논문은 텍스트, 이미지, 비디오, 마스크, 오디오 참조를 포함한 다양한 입력을 처리하고, 비디오-오디오 생성, 인페인팅 및 편집 기능을 단일 프레임워크 내에서 통합적으로 지원하는 멀티모달 비디오 파운데이션 모델 을 개발하는 것을 목표로 합니다.

#Review #Multi-modal Generation #Video-Audio Synthesis #Video Inpainting #Video Editing #Diffusion Transformer #MMLM #Super-resolution #Frame Interpolation

2026년 2월 25일

[논문리뷰] SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models

확산 모델의 느린 추론 속도를 개선하기 위해 기존 캐싱 방법론이 원시 특징(raw feature) 차이 에만 의존하여 콘텐츠와 노이즈를 혼합하고, 이로 인해 스펙트럼 진화(spectral evolution) 를 간과하는 문제를 해결하고자 합니다.

#Review #Diffusion Models #Model Acceleration #Feature Caching #Spectral Analysis #Generative AI #Image Generation #Video Generation #Latency Reduction

2026년 2월 25일

[논문리뷰] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

본 논문은 Large Vision-Language Models (LVLMs) 에서 출력 이미지에 존재하지 않는 객체를 생성하는 객체 환각(Object Hallucinations) 문제를 해결하는 것을 목표로 합니다.

#Review #Large Vision-Language Models (LVLMs)#Object Hallucinations #Language Priors #Contrastive Decoding #Dynamic Suppression #Training-Free #Multimodal AI

2026년 2월 25일

[논문리뷰] NanoKnow: How to Know What Your Language Model Knows

본 연구는 대규모 언어 모델(LLMs)이 지식을 어떻게 획득하고 활용하는지에 대한 근본적인 질문에 답하고자 합니다. 특히, LLM의 사전 훈련 데이터가 종종 '블랙 박스'로 남아있어 지식의 출처를 추적하기 어렵다는 문제를 해결하고, 파라미터 내 지식과 외부 지식의 상호작용을 명확히 이해하는 것을 목표로 합니다.

#Review #LLM Knowledge #Pre-training Data #Retrieval-Augmented Generation (RAG)#FineWeb-Edu #nanochat #Benchmarking #Question Answering #Data Attribution

2026년 2월 25일

[논문리뷰] MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

이 논문은 기존 분자 그래프 생성 모델, 특히 그래프 확산 모델 이 겪는 낮은 화학적 유효성(validity)과 구조적 다양성(novelty) 부족 문제를 해결하여, 1D 시퀀스 기반 모델의 성능을 뛰어넘는 새로운 분자 그래프 생성 프레임워크 MolHIT 을 제안하는 것을 목표로 합니다.

#Review #Molecular Generation #Graph Diffusion Models #Hierarchical Diffusion #Discrete Diffusion #Atom Encoding #Drug Discovery #Material Science

2026년 2월 25일

[논문리뷰] Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

본 논문은 Model Context Protocol (MCP) 도구 설명 에 내재된 결함이나 '냄새'의 만연함과 그 영향에 대한 불확실성을 해결하고자 합니다.

#Review #Model Context Protocol #AI Agents #Tool Descriptions #Software Smells #Prompt Engineering #Foundation Models #Performance Evaluation #Ablation Study

2026년 2월 25일

[논문리뷰] MoBind: Motion Binding for Fine-Grained IMU-Video Pose Alignment

IMU 신호와 비디오에서 추출된 2D 포즈 시퀀스 간의 정교한 정렬을 위한 공동 표현 학습 을 목표로 합니다.

#Review #Multi-modal Alignment #Contrastive Learning #IMU-Video Fusion #Pose Estimation #Temporal Synchronization #Human Motion Analysis #Hierarchical Learning

2026년 2월 25일

[논문리뷰] JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

기존 오픈소스 공동 오디오-비디오 생성(JAVG) 모델들이 생성 품질 , 시간 동기화 , 그리고 인간 선호도 정렬 측면에서 상용 모델(예: Veo3)에 비해 한계를 보이는 문제를 해결하는 것을 목표로 합니다.

#Review #Joint Audio-Video Generation #Diffusion Transformer #Modality-specific Mixture-of-Experts #Temporal-Aligned ROPE #Direct Preference Optimization #Multimodal Generation #Text-to-AV

2026년 2월 25일