최신 포스트

[vllm] vLLM, DeepSeek-V4 K 캐시 커널 최적화: CuteDSL 도입으로 성능 향상

vLLM의 DeepSeek-V4 모델에서 K 캐시 커널의 메모리 대역폭 활용도를 높여 성능을 개선한 PR 분석

#vLLM #DeepSeek-V4 #성능 최적화 #GPU 커널 #CuteDSL #Triton

2026년 5월 11일

[sglang] SGLang의 Breakable CUDA Graph 최적화: 배치 사이즈 제한 극복하기

SGLang에서 CUDA Graph의 배치 사이즈 제약을 해결하고, 유연한 추론을 가능하게 하는 아키텍처 개선 분석.

#SGLang #CUDA Graph #LLM #Inference Optimization #PyTorch

2026년 5월 11일

[flashinfer] FlashInfer, 동적 토큰 페이지 커널 도입으로 TRTLLM-GEN GQA 성능 최적화

FlashInfer가 TRTLLM-GEN GQA 커널에 동적 토큰 페이지 기능을 도입하여 LLM 추론 성능을 향상시켰습니다.

#FlashInfer #LLM #최적화 #GQA #TRTLLM-GEN #성능

2026년 5월 11일

[논문리뷰] Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages

본 논문은 AI 에이전트가 인지 노동 시장의 임금을 결정하는 방식에 대한 기존의 경제학적 오해를 바로잡고, 새로운 가격 결정 프레임워크를 제안한다.

#Review #AI Agents #Factor Pricing #Compute-Anchored Wage #Labor Market #Capital-to-Labor Conversion

2026년 5월 10일

[논문리뷰] What if AI systems weren't chatbots?

본 논문은 인공지능이 대화형 챗봇 인터페이스로 지나치게 빠르게 수렴하고 있다는 점을 지적하며, 이 패러다임이 가져오는 구조적인 사회적, 경제적, 환경적 폐해를 분석한다.

#Review #Conversational AI #Chatbots #User Agency #Sociotechnical Systems #Human-Computer Interaction #AI Governance #Environmental Justice

2026년 5월 10일

[논문리뷰] What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

본 논문은 기존 Latent Diffusion Models(LDMs)의 tokenizer들이 주로 reconstruction fidelity에만 초점을 맞추어 설계되어, 정작 확산 생성 모델의 학습에 적합한 latent space를 형성하지 못한다는 문제를 제기합니다 .

#Review #Latent Diffusion Models #Tokenizer #Latent Manifold #Prior Alignment #Autoencoder #Generative Modeling #Representation Learning

2026년 5월 10일

[논문리뷰] UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

본 논문은 기존 LLM의 post-training 과정이 외부 모델에 지나치게 의존함으로써 발생하는 비용 문제와 보안 위험을 해결하기 위해 UniSD라는 통일된 Self-Distillation 프레임워크를 제안합니다.

#Review #Self-Distillation #Large Language Models #On-Policy Learning #Supervision Reliability #Representation Alignment #Training Stability

2026년 5월 10일

[논문리뷰] UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

본 논문은 기존의 prefill 가속 기법들이 최신 하이브리드 LLM 아키텍처와 연속 배치(continuous batching) 환경에 부적합하다는 문제를 해결합니다.

#Review #Long-Context LLM #Prefill Acceleration #Dynamic Sparsification #Hybrid Architectures #Continuous Batching #vLLM

2026년 5월 10일

[논문리뷰] Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

본 논문은 최신 UMM이 이해와 생성 기능을 한 모델 내에 통합했음에도 불구하고, 실제로는 두 구성 요소가 상호작용 없이 분리된(Decoupled) 구조로 설계되어 성능 극대화에 한계가 있다는 문제를 지적합니다.

#Review #Unified Multimodal Models #Understanding-Oriented Post-Training #Generation Synergy #Flow Matching #Semantic Supervision #MetaQuery

2026년 5월 10일

[논문리뷰] SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

본 논문은 기존 Speculative Decoding의 Drafter들이 가진 상반된 한계점을 극복하기 위해 제안되었다.

#Review #LLM Inference #Speculative Decoding #Tree-based Verification #Block-Iterative Drafting #Rank-guided Expansion #Serving-time Adaptation

2026년 5월 10일

[논문리뷰] Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

본 논문은 최신 Vision-Language Models(VLMs)가 Adversarial 공격에 극도로 취약하며, 기존의 탐지 방식들은 실질적인 배포 환경에서의 강력한 공격이나 데이터 분포 변화에 대응하지 못한다는 문제를 해결하고자 합니다.

#Review #Vision-Language Models #Adversarial Attack Detection #Sparse Autoencoders #Plug-and-Play #Robustness #Out-of-Domain Generalization

2026년 5월 10일

[논문리뷰] SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

본 연구는 서로 다른 에이전트 프레임워크가 스킬의 프롬프트 포맷에 대해 높은 민감도를 보이며, 이로 인해 발생하는 성능 차이와 기존 스킬의 보안 취약점을 해결하고자 한다.

#Review #LLM-Agents #Skill compilation #Prompt engineering #Format adaptation #Security hardening #Intermediate representation

2026년 5월 10일

[논문리뷰] Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

본 논문은 decoder-only 모델에서 long-context 추론 시 발생하는 Prefill 단계의 높은 계산 비용과 Decode 단계의 KV-cache 메모리 대역폭 한계를 해결하고자 합니다.

#Review #Long-Context Inference #KV-Cache #Phase-Asymmetric #Prefill #Decode #Transformer

2026년 5월 10일

[논문리뷰] Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts

본 논문은 기존의 CL 방법론들이 20개 내외의 제한된 태스크 수에서만 검증되어 왔다는 한계를 지적하며, 매우 긴 태스크 시퀀스에서 발생하는 성능 저하 문제를 해결하고자 합니다.

#Review #Continual Learning #Class-Incremental Learning #Mixture-of-Experts #Bi-Level Routing #Long Task Sequence

2026년 5월 10일

[논문리뷰] STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation

본 논문은 기존의 통합 멀티모달 모델들이 겪는 생성 메커니즘의 구조적 파편화 문제를 해결하고자 합니다.

#Review #Multimodal Generation #Normalizing Flows #Autoregressive Transformers #Pretzel Architecture #Unified Modeling #Visual Understanding

2026년 5월 10일

[논문리뷰] SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

본 논문은 복잡한 visual intent를 충족해야 하는 최신 text-to-image 생성 모델들이 겪는 semantic commitment의 단절 문제, 즉 Conceptual Rift를 해결하고자 합니다.

#Review #Text-to-Image Generation #Agentic Framework #Semantic Commitments #Structured Specification #Skill Orchestration #Gen-Arena

2026년 5월 10일

[논문리뷰] Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

본 논문은 recurrent 아키텍처의 상태 추적(state tracking) 능력이 단순히 이론적 표현력(expressivity)만으로 결정되지 않으며, hidden-state의 drift를 제어하는 에러 제어(error control) 역학에 의해 좌우된다는 점을 규명합니다.

#Review #State Tracking #Recurrent Models #Error Control #Affine Recurrences #State-Space Models #Symbolic Dynamics

2026년 5월 10일

[논문리뷰] Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

본 논문은 LLM 추론 능력 향상에 필수적이라고 여겨지는 RL이 실제로 새로운 전략을 학습하는 것이 아니라, 베이스 모델 내에 이미 존재하는 솔루션들의 확률 분포를 재조정하는 것임을 밝힙니다. 기존의 RLVR은 전체 토큰에 대해 비효율적으로 경사 하강법을 수행하지만, 실제 추론 성능 개선은 극히 일부 지점에서 발생합니다.

#Review #Large Language Models #Reinforcement Learning #Reasoning #Decision Points #Sparse Policy Selection #Contrastive Fine-Tuning #Entropy-Gated

2026년 5월 10일

[논문리뷰] R^3-SQL: Ranking Reward and Resampling for Text-to-SQL

본 연구는 기존 Text-to-SQL 시스템의 순위 결정(Ranking) 과정에서 발생하는 Functional Inconsistency와 Bounded Recall이라는 두 가지 핵심 과제를 해결하는 데 목적을 둡니다.

#Review #Text-to-SQL #Ranking #Resampling #Functional Inconsistency #Bounded Recall #Agentic Workflow

2026년 5월 10일

[논문리뷰] Normalizing Trajectory Models

본 논문은 기존 diffusion 및 flow matching 모델이 few-step generation 과정에서 겪는 가우시안(Gaussian) 근사의 한계를 해결하고자 합니다.

#Review #Normalizing Trajectory Models #Flow Matching #Normalizing Flows #Few-step Generation #Exact Likelihood #Stochastic Trajectory

2026년 5월 10일