본문으로 건너뛰기

secrett2633's blog

카테고리

Python

PEP (650)

AI/ML

Review (4456)

OpenSource

PR Analysis (938)
vLLM (71)
SGLang (130)
llm-compressor (45)

Python

PEP (650)

AI/ML

Review (4456)

OpenSource

PR Analysis (938)
vLLM (71)
SGLang (130)
llm-compressor (45)

홈
#Prefill

#Prefill

2개의 포스트

[논문리뷰] Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

본 논문은 decoder-only 모델에서 long-context 추론 시 발생하는 Prefill 단계의 높은 계산 비용과 Decode 단계의 KV-cache 메모리 대역폭 한계를 해결하고자 합니다.

#Review #Long-Context Inference #KV-Cache #Phase-Asymmetric #Prefill #Decode #Transformer

2026년 5월 10일댓글 수 로딩 중

[flashinfer] FlashInfer, CuTe DSL 기반 FMHA 커널 통합으로 사전 생성(Prefill) 성능 극대화

FlashInfer가 CuTe DSL FMHA 커널을 통합하여 사전 생성(Prefill) 성능을 최적화했습니다.

#FlashInfer #CuTe DSL #FMHA #Prefill #최적화 #성능 개선 #딥러닝 #LLM

2026년 4월 24일댓글 수 로딩 중

AI Review Python PEP PR Analysis RSS GitHub

© 2026 secrett2633. All rights reserved.