[SGLang] 캐시 Eviction 정책: LRU, LFU, FIFO 비교 분석

2026년 4월 10일수정: 2026년 4월 10일

들어가며

KV Cache의 GPU 메모리는 유한하다. 새로운 요청이 들어오면 오래된 캐시를 제거해야 한다. 이때 "어떤 캐시를 먼저 제거할 것인가"가 eviction 정책의 핵심이다.

SGLang은 python/sglang/srt/mem_cache/evict_policy.py에서 7가지 eviction 전략을 Strategy 패턴으로 구현한다. 각 전략은 단 하나의 메서드 get_priority()만 구현하며, 반환값이 작을수록 먼저 evict된다.

구조도

EvictionStrategy (ABC)
├── LRUStrategy        ← 가장 오래 전에 접근된 노드 우선
├── LFUStrategy        ← 가장 적게 접근된 노드 우선
├── FIFOStrategy       ← 가장 먼저 생성된 노드 우선
├── MRUStrategy        ← 가장 최근에 접근된 노드 우선
├── FILOStrategy       ← 가장 최근에 생성된 노드 우선
├── PriorityStrategy   ← 외부 우선순위 기반
└── SLRUStrategy       ← Segmented LRU (2단계)

TreeNode 속성:
┌─────────────────────────┐
│ last_access_time: float │ ← LRU, MRU에서 사용
│ hit_count: int          │ ← LFU, SLRU에서 사용
│ creation_time: float    │ ← FIFO, FILO에서 사용
│ priority: int           │ ← PriorityStrategy에서 사용
└─────────────────────────┘

핵심 인터페이스: EvictionStrategy

모든 eviction 정책은 EvictionStrategy ABC를 상속한다. get_priority()가 반환하는 값으로 노드의 eviction 순서가 결정된다.

class EvictionStrategy(ABC):
    @abstractmethod
    def get_priority(self, node: "TreeNode") -> Union[float, Tuple]:
        pass

반환 타입이 float 또는 Tuple인 이유는 Python의 tuple 비교 특성을 활용하기 위해서다. (0, 100.0) < (1, 50.0)이므로, 다중 기준 정렬을 자연스럽게 표현할 수 있다.

기본 전략: LRU, LFU, FIFO

가장 단순한 세 가지 전략은 각각 노드의 단일 속성만 참조한다.

class LRUStrategy(EvictionStrategy):
    def get_priority(self, node: "TreeNode") -> float:
        return node.last_access_time

class LFUStrategy(EvictionStrategy):
    def get_priority(self, node: "TreeNode") -> Tuple[int, float]:
        return (node.hit_count, node.last_access_time)

class FIFOStrategy(EvictionStrategy):
    def get_priority(self, node: "TreeNode") -> float:
        return node.creation_time

LFU는 (hit_count, last_access_time) 튜플을 반환한다. hit_count가 같으면 last_access_time으로 tie-breaking한다. 즉, 접근 횟수가 적고 오래된 노드가 먼저 evict된다.

역방향 전략: MRU, FILO

LRU와 FIFO의 정반대 전략도 제공한다. 부호를 뒤집는 것만으로 구현된다.

class MRUStrategy(EvictionStrategy):
    def get_priority(self, node: "TreeNode") -> float:
        return -node.last_access_time

class FILOStrategy(EvictionStrategy):
    def get_priority(self, node: "TreeNode") -> float:
        return -node.creation_time

MRU는 가장 최근에 접근된 노드를 먼저 제거한다. 스캔 패턴(전체 데이터를 한 번 훑는 접근)에서 LRU보다 효과적일 수 있다. 방금 스캔한 데이터는 다시 접근할 가능성이 낮기 때문이다.

PriorityStrategy: 외부 우선순위

외부에서 노드별 priority를 지정할 수 있는 전략이다.

class PriorityStrategy(EvictionStrategy):
    """Priority-aware eviction: lower priority values evicted first,
    then LRU within same priority."""

    def get_priority(self, node: "TreeNode") -> Tuple[int, float]:
        return (node.priority, node.last_access_time)

priority 값이 낮을수록 먼저 evict된다. 같은 priority 내에서는 LRU를 따른다. 시스템 프롬프트처럼 항상 유지해야 하는 캐시에 높은 priority를 부여하는 용도로 적합하다.

SLRUStrategy: 세그먼트 분리 LRU

가장 정교한 전략이다. 노드를 probationary(시험) 세그먼트와 protected(보호) 세그먼트로 나눈다.

class SLRUStrategy(EvictionStrategy):
    def __init__(self, protected_threshold: int = 2):
        self.protected_threshold = protected_threshold

    def get_priority(self, node: "TreeNode") -> Tuple[int, float]:
        is_protected = 1 if node.hit_count >= self.protected_threshold else 0
        return (is_protected, node.last_access_time)

동작 원리는 다음과 같다.

요청이 들어옴 → hit_count 증가

hit_count < threshold (기본 2)
  → Probationary 세그먼트 (is_protected=0)
  → 먼저 evict 대상

hit_count >= threshold
  → Protected 세그먼트 (is_protected=1)
  → Probationary가 모두 evict된 후에야 대상

튜플 비교에서 (0, time) < (1, time)이므로, probationary 노드가 항상 protected 노드보다 먼저 evict된다. 이는 한 번만 접근된 데이터가 자주 접근되는 데이터를 밀어내는 cache pollution 문제를 방지한다.

Eviction 정책 비교표

정책	기준	반환 타입	장점	단점	적합한 시나리오
LRU	마지막 접근 시간	`float`	단순, 범용적	스캔 패턴에 취약	일반적인 대화형 서빙
LFU	접근 횟수 + 시간	`Tuple`	인기 데이터 보호	과거 인기 데이터 고착	시스템 프롬프트 공유
FIFO	생성 시간	`float`	구현 단순, 예측 가능	접근 패턴 무시	배치 처리, 파이프라인
MRU	최근 접근 (역순)	`float`	스캔 패턴에 강함	일반 워크로드에 부적합	대규모 컨텍스트 스캔
FILO	생성 시간 (역순)	`float`	오래된 데이터 보호	최신 데이터 손실 위험	장기 세션 유지
Priority	외부 우선순위 + LRU	`Tuple`	세밀한 제어	우선순위 관리 필요	멀티테넌트 서빙
SLRU	세그먼트 + LRU	`Tuple`	pollution 방지	threshold 튜닝 필요	혼합 워크로드

설계 근거: Strategy 패턴의 이점

모든 전략이 get_priority() 하나만 구현하므로, Radix Cache의 eviction 루프를 수정하지 않고 정책을 교체할 수 있다. eviction 루프는 노드를 priority 순으로 정렬하고 가장 작은 값부터 제거하면 된다.

Eviction Loop (RadixCache 내부):
  nodes = collect_evictable_nodes()
  sort by strategy.get_priority(node)  ← 전략 교체 지점
  while need_more_space:
      evict(nodes.pop_min())

새 전략을 추가하려면 EvictionStrategy를 상속하고 get_priority()만 구현하면 된다. 기존 캐시 코드는 전혀 변경하지 않는다.

참고

SGLang 의 다른글

이전글 [SGLang] Mamba Radix Cache: SSM 모델을 위한 상태 캐싱
현재글 : [SGLang] 캐시 Eviction 정책: LRU, LFU, FIFO 비교 분석
다음글 [SGLang] Hybrid Cache Controller: GPU/CPU 하이브리드 캐시 관리