#H2D

1개의 포스트

[sglang] [성능 최적화] SGLang `prepare_for_decode`에서 `latest_output_ids` H2D 복사 비동기화로 디코딩 처리량 30% 향상

SGLang 디코딩 과정에서 `latest_output_ids`의 H2D 복사를 비동기화하여 성능을 크게 개선한 사례 분석.

#SGLang #PyTorch #CUDA #성능 최적화 #GPU #LLM #H2D #비동기 프로그래밍

2026년 6월 17일