#SLO-aware Scheduling

1개의 포스트

[논문리뷰] FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving

본 논문은 LLM 서빙 시스템에서 컴퓨팅 집약적인 프리필(prefill) 단계 중 발생하는 헤드-오브-라인(Head-of-Line, HoL) 블로킹 문제 를 해결하고자 합니다.

#Review #LLM Serving #Head-of-Line Blocking #Preemption #Prefill Scheduling #Time-to-First-Token (TTFT)#SLO-aware Scheduling #Operator-Level Preemption #Event-Driven Scheduling

2026년 2월 24일