최신 포스트

[onnxruntime] ONNX Runtime의 CPU GQA 최적화: Flash Attention과 Flash Decoding 도입

CPU 환경에서 INT8/INT4 양자화된 KV 캐시를 위한 Flash Attention 기반의 타일링 및 Flash Decoding 구현으로 성능을 극대화합니다.

#ONNX Runtime #LLM #Flash Attention #CPU Optimization #Quantization

2026년 5월 29일

[flashinfer] FlashInfer MLA 커널 최적화: num_heads < 128 환경에서의 성능 극대화

Blackwell GPU에서 MLA 디코드 커널의 num_heads < 128 성능을 개선하기 위해 seqlen_q를 헤드 차원으로 폴딩하는 최적화 기법을 소개합니다.

#FlashInfer #GPU #MLA #Optimization #Blackwell #CUDA

2026년 5월 29일

[axolotl] Axolotl MoE 모델 최적화: Tiled-MLP 도입 및 FSDP2 통합으로 성능 극대화

Axolotl에서 MoE 모델의 성능을 획기적으로 개선한 Tiled-MLP 도입 및 FSDP2 최적화 분석

#Axolotl #MoE #Tiled-MLP #FSDP2 #최적화 #성능 개선 #딥러닝

2026년 5월 28일

[feast] Feast Feature Server의 직렬화 성능 4배 향상: MessageToDict 최적화

Feast의 Feature Server에서 Protobuf 직렬화 병목을 해결하기 위해 커스텀 dict 빌더를 도입하여 성능을 4배 개선한 사례를 분석합니다.

#Feast #Python #Protobuf #Performance #Optimization

2026년 5월 28일

[sglang] [SGLang] Blackwell(B200)에서 Diffusion Attention 성능을 7배 끌어올리는 Triton 커널 최적화 분석

PyTorch SDPA의 마스크 처리 한계를 Triton 커널 퓨전과 Varlen FlashAttention으로 극복하여 B200에서 최대 21%의 성능 향상을 달성했습니다.

#Triton #FlashAttention #Diffusion #CUDA #Performance Optimization #SGLang

2026년 5월 28일

[vllm] vLLM의 MoE Permute 최적화: 버퍼 사전 할당을 통한 성능 향상

MoE 연산 시 빈번한 메모리 할당을 제거하여 소규모 배치에서 최대 14%의 성능 향상을 달성한 최적화 기법을 분석합니다.

#vLLM #MoE #CUDA #PerformanceOptimization #DeepLearning

2026년 5월 28일

[transformers] Apple Silicon의 MPS에서 Flash Attention 최적화: 속도와 효율성 향상

Apple Silicon의 MPS 환경에서 Flash Attention의 성능을 1.66배 향상시키는 최적화 방안을 소개합니다.

#Apple Silicon #MPS #Flash Attention #최적화 #성능 향상 #Hugging Face Transformers

2026년 5월 28일

[논문리뷰] minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

본 논문은 기존의 고품질 Video Foundation Model을 실시간 상호작용이 가능한 Interactive World Model로 전환하는 파이프라인의 부재 문제를 해결합니다.

#Review #Video World Models #Diffusion Models #Autoregressive #Distillation #Real-time Inference #Camera Control

2026년 5월 28일

[논문리뷰] YoCausal: How Far is Video Generation from World Model? A Causality Perspective

본 논문은 최신 Video Diffusion Models (VDMs)가 진정한 의미의 세계 모델(World Model)로 발전하고 있는지, 아니면 단순히 통계적 시간 패턴을 과적합(overfit)하고 있는지를 검증하고자 합니다.

#Review #Video Generation #World Models #Causality #Violation of Expectation #Reverse Surprise Index #Causality Cognition Index #Diffusion Models

2026년 5월 28일

[논문리뷰] WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

본 논문은 기존 memory 벤치마크가 정적인 대화 데이터에 편향되어 있고, memory를 단일 성공 지표로만 평가하여 실패 원인 파악이 어렵다는 문제를 해결하기 위해 WorldMemArena를 제안한다.

#Review #Multimodal Agent #Memory Benchmark #Action-World Interaction #Lifecycle Evaluation #Long-horizon #Lifelong Evolution #Agentic Execution

2026년 5월 28일

[논문리뷰] Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

본 논문은 더 큰 모델(Larger Models)이 더 작은 모델이 학습하지 못하는 작업들을 어떻게 학습하는지에 대한 근본적인 메커니즘을 규명하고자 합니다.

#Review #Scaling Laws #Rare-Task Retention #Gradient Interference #Neural Network Scaling #Multi-Task Learning #Feature Learning

2026년 5월 28일

[논문리뷰] When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

본 논문은 LLM이 장기적인 상호작용 속에서 누적되는 정보들 중 무엇을 믿고, 무엇을 수정하며, 무엇을 무시해야 하는지에 대한 문제(CBM)를 해결하고자 합니다. 기존의 LLM은 문맥 내에서 제공되는 형식적 증거를 따르기보다 사전 학습된 파라메트릭 지식이나 문맥상의 노이즈에 과도하게 의존하는 경향이 있습니다 .

#Review #Contextual Belief Management #Large Language Models #BeliefTrack #Reinforcement Learning #Contextual Interference #Symbolic Verification

2026년 5월 28일

[논문리뷰] When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

본 연구는 클라우드 기반의 고성능 Frontier 모델과 에지 장치 기반의 고효율 SLM(Small Language Model)을 통합하는 하이브리드 Multi-Agent System(MAS)의 설계 공간을 체계적으로 탐구합니다.

#Review #Multi-Agent Systems #Hybrid AI #Edge Inference #Cloud Agents #Agentic Workflow #KV-cache #Model Routing

2026년 5월 28일

[논문리뷰] Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

본 논문은 지식 집약적 QA 작업에서 LLM의 사실적 정확도를 높이기 위한 효율적인 보상 신호가 부족하다는 점을 문제로 지적합니다.

#Review #Reinforcement Learning #Factuality #Process Supervision #Wikipedia #Co-occurrence #Large Language Models #GRPO

2026년 5월 28일

[논문리뷰] Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation

본 논문은 UDM에서 사용되는 Bridge Plug-in 파라미터화가 표준적인 노이즈 제거 목표(denoising posterior)를 최적화하지 못한다는 구조적 불일치 문제를 해결합니다.

#Review #Uniform Diffusion Models #Leave-one-out #Denoiser #Absorbing State Reformulation #Discrete Diffusion #Bridge Plug-in

2026년 5월 28일

[논문리뷰] UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

본 논문은 LLM의 행동 제어를 위한 기존 Activation Steering 방법론들이 가진 확장성 및 구성적 제약 문제를 해결하기 위해 UniSteer를 제안합니다.

#Review #LLM Steering #Activation Space #Flow Matching #Text-Guided Control #Activation Inversion #Multi-Constraint #Zero-shot Classification

2026년 5월 28일

[논문리뷰] UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

본 논문은 모바일 GUI 자동화에서 lightweight 모델이 겪는 End-to-End 계획 수립의 한계를 극복하고자 합니다. 현재 대부분의 GUI 에이전트는 거대한 VLM에 의존하며, 이는 컴퓨팅 자원이 제한적인 온디바이스(on-device) 환경에서 높은 추론 비용과 신뢰성 부족 문제를 야기합니다.

#Review #GUI Agent #Knowledge Graph #Autonomous Exploration #On-device AI #Lightweight Model #Mobile Automation

2026년 5월 28일

[논문리뷰] Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

본 연구는 대규모 언어 모델(LLM)이 Deep Research 분야에서 사실 기반의 긴 리포트를 작성할 때 발생하는 불투명성과 시각 자료 활용의 한계를 해결하고자 합니다.

#Review #Multi-Agent System #Multimodal Deep Research #Verifiable Generation #Test-Time Scaling #Visual Working Memory #Report Generation

2026년 5월 28일

[논문리뷰] Towards Consistent Video Geometry Estimation

본 논문은 기존 비디오 기하학 추정 모델들이 모델 구조나 학습 프로토콜에 따라 offline(full-sequence) 또는 online(streaming) 환경 중 하나에만 국한되는 문제를 해결합니다.

#Review #Foundation Model #Video Geometry Estimation #Dynamic Chunking Attention #Depth Estimation #Surface Normal Estimation #Point Map Estimation

2026년 5월 28일

[논문리뷰] Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

본 논문은 HuggingFace와 같은 공공 모델 허브에서 배포되는 LoRA 어댑터가 데이터 오염(Poisoning)을 통해 치명적인 백도어에 취약할 수 있다는 점을 지적합니다.

#Review #LoRA Adapter #Backdoor Attack #Data Poisoning #Behavioral Detection #Weight-Level Detection #LLM Security

2026년 5월 28일