#DeepSeek

9개의 포스트

[SGLang] NSA (Narrow Sparse Attention): DeepSeek의 스파스 어텐션

SGLang의 NSA 백엔드를 분석한다. DeepSeek의 Narrow Sparse Attention이 선택적 토큰만 어텐션하는 원리, 인덱서 구조, Triton/TileLang 커널을 코드와 함께 살펴본다.

#sglang #NSA #Sparse Attention #DeepSeek #Selective Attention

2026년 4월 11일

[SGLang] Multi-head Latent Attention (MLA): KV 캐시 압축 어텐션

SGLang의 MLA 구현을 분석한다. DeepSeek-V2의 Multi-head Latent Attention이 KV 캐시를 압축하는 원리, 기존 MHA 대비 7x 성능 향상, FlashInfer·FlashMLA·CUTLASS 3종 백엔드를 코드와 함께 비교한다.

#sglang #MLA #Multi-head Latent Attention #KV Compression #DeepSeek

2026년 4월 11일

[sglang] DeepSeek V3/R1 추론 최적화: DeepEP 공유 전문가(Shared Expert) 융합 기술 분석

DeepEP 환경에서 공유 전문가를 MoE 경로로 통합하여 독립적 연산 오버헤드를 제거하고 추론 성능을 개선하는 최적화 기법을 살펴봅니다.

#SGLang #DeepSeek #MoE #DeepEP #LLM Inference

2026년 4월 9일

[sglang] SGLang에서 DeepSeek V3.2를 위한 IndexCache 최적화 구현

DeepSeek V3.2 모델의 IndexCache 도입을 통해 추론 성능을 약 6.4% 향상시킨 기술적 분석과 구현 상세.

#SGLang #DeepSeek #LLM #Optimization #Inference

2026년 4월 5일

[sglang] Blackwell GPU에서 TRT-LLM 커널을 DSA 기본값으로 설정

Blackwell(SM>=10) GPU에서 dp_size 조건을 제거하고 TRT-LLM 커널을 항상 기본 사용하도록 변경

#SGLang #TRT-LLM #Blackwell #DeepSeek

2026년 4월 2일

[sglang] TRT-LLM Sparse MLA 커널의 prefill 배치 지원

TRT-LLM sparse MLA 커널이 prefill 배치에서 올바른 page table 변환을 사용하도록 수정하여 정확도 개선

#SGLang #TRT-LLM #MLA #DeepSeek #Attention

2026년 4월 1일

[논문리뷰] HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

Yuxuan Wang이 arXiv에 게시한 'HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention' 논문에 대한 자세한 리뷰입니다.

#Review #Sparse Attention #Hierarchical Indexing #Long Context #LLM Inference #Computational Efficiency #DeepSeek

2026년 3월 30일

[논문리뷰] DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

arXiv에 게시된 'DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Sparse Attention #Reinforcement Learning #Agentic AI #Tool Use #Open-source LLM #DeepSeek

2025년 12월 2일

[SGLang] DeepSeek V3.2 지원 추가

SGLang에 DeepSeek V3.2 모델과 Native Sparse Attention(NSA) 백엔드를 추가한다

#SGLang #DeepSeek #Sparse Attention #Model Support

2025년 10월 6일