#Latency Hiding

1개의 포스트

[논문리뷰] Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

본 논문은 기존 Speculative Decoding의 핵심인 다중 토큰 예측(Multi-token prediction) 방식이 갖는 구조적 한계를 극복하고자 합니다.

#Review #Speculative Decoding #Pipeline Parallelism #LLM Inference #Feature Aggregation #Latency Hiding #Throughput

2026년 6월 1일