#speculative-decoding

4개의 포스트

[vLLM] Tree Attention: 투기적 디코딩용 트리 어텐션

vLLM의 Tree Attention 백엔드를 분석한다. 투기적 디코딩의 트리 구조 토큰 검증을 위한 어텐션 마스크 생성과 Triton 기반 통합 어텐션을 살펴본다.

#vllm #tree-attention #speculative-decoding #triton

2026년 4월 8일

[vLLM] MTP & DFlash: 다중 토큰 예측과 Flash 기반 드래프팅

vLLM의 DFlash 투기적 디코딩 구현을 분석한다. 다중 토큰 예측(MTP)을 Flash Attention 기반으로 구현한 DFlashProposer의 핵심 로직을 살펴본다.

#vllm #speculative-decoding #mtp #dflash #flash-attention

2026년 4월 8일

[vLLM] EAGLE: 은닉 상태 기반 드래프트로 Speculative Decoding을 강화하다

타겟 모델의 은닉 상태(hidden states)를 직접 활용하여 드래프트 정확도를 높이는 EAGLE의 vLLM 구현을 분석한다.

#vllm #eagle #speculative-decoding #hidden-states

2026년 4월 7일

[vLLM] Speculative Decoding: 드래프트 모델로 LLM 디코딩을 가속하는 원리

작은 드래프트 모델이 여러 토큰을 미리 생성하고, 큰 타겟 모델이 한 번에 검증하는 Speculative Decoding의 vLLM 구현을 분석한다.

#vllm #speculative-decoding #inference-acceleration #draft-model

2026년 4월 7일