[sglang] flush_cache 단순화: 동시 요청 거부와 클라이언트 재시도 제거

2026년 3월 26일수정: 2026년 3월 26일

PR 링크: sgl-project/sglang#21490 상태: Merged | 변경: +110 / -110

들어가며

/flush_cache API는 SGLang 서버의 radix cache를 비우는 엔드포인트입니다. 기존 구현에서는 여러 flush 요청을 Deque에 큐잉하고, 클라이언트 측에서 flush_cache_with_retry() 유틸리티로 폴링 재시도를 수행했습니다. 이번 PR은 동시에 하나의 flush만 허용하고, 서버 측 timeout을 지원하며, 클라이언트 재시도 로직을 제거합니다.

핵심 코드 분석

1. Deque에서 Optional로 변경

Before:

self._pending_flush: Deque[Tuple[FlushCacheReqInput, float]] = deque()

def flush_cache_wrapped(self, recv_req):
    # ...
    self._pending_flush.append((recv_req, time.monotonic() + timeout_s))

After:

self._pending_flush: Optional[Tuple[FlushCacheReqInput, float]] = None

def flush_cache_wrapped(self, recv_req):
    if self._pending_flush is not None:
        return FlushCacheReqOutput(
            success=False,
            message="Another flush_cache is already in progress.")
    self._pending_flush = (recv_req, time.monotonic() + timeout_s)

동시에 여러 flush 요청이 들어오면, 두 번째부터는 즉시 거부합니다.

2. 클라이언트 측 재시도 로직 제거

Before:

def flush_cache_with_retry(base_url, timeout=30.0, poll_interval=0.5):
    deadline = time.time() + timeout
    while time.time() < deadline:
        response = requests.post(f"{base_url}/flush_cache", timeout=10)
        if response.status_code == 200:
            return True
        time.sleep(poll_interval)
    return False

After:

# flush_cache_with_retry 함수 완전 삭제
# 테스트 코드에서는 서버 timeout 활용:
res = requests.post(
    f"{self.base_url}/flush_cache",
    params={"timeout": 30},
    timeout=40)
res.raise_for_status()

서버가 timeout 동안 idle 상태를 기다려주므로, 클라이언트가 폴링할 필요가 없습니다.

3. 에러 메시지 개선

Before:

return Response(
    content="Cache flushed.\nPlease check backend logs...",
    status_code=200 if ret.success else HTTPStatus.BAD_REQUEST)

After:

if ret.success:
    content = "Cache flushed.\nPlease check backend logs..."
else:
    content = ret.message or "Flush cache failed.\n"
return Response(content=content, status_code=200 if ret.success else HTTPStatus.BAD_REQUEST)

실패 이유(timeout, 이미 진행 중 등)를 클라이언트에게 명확히 전달합니다.

왜 이게 좋은가

복잡도 감소: Deque 관리, 만료 처리, 클라이언트 재시도 등 불필요한 로직이 제거됩니다.
명확한 의미론: flush는 한 번에 하나만 가능하다는 제약이 타입 레벨(Optional vs Deque)에서 드러납니다.
테스트 단순화: flush_cache_with_retry 유틸리티와 time.sleep 호출이 테스트에서 사라져 flaky test 가능성이 줄어듭니다.

정리

+110/-110이라는 대칭적인 변경량이 보여주듯, 복잡한 구현을 단순한 구현으로 교체한 리팩토링입니다. 동시 flush를 큐잉하는 것보다 거부하는 것이 더 안전하고 예측 가능합니다.

참고 자료

⚠️ 알림: 이 분석은 AI가 실제 코드 diff를 기반으로 작성했습니다.

PR Analysis 의 다른글

이전글 [sglang] CI 버그 수정: /rerun-ut 동시 실행 시 중복 워크플로우 URL 문제 해결
현재글 : [sglang] flush_cache 단순화: 동시 요청 거부와 클라이언트 재시도 제거
다음글 [CPython] pickle fast_save_enter() 테스트 정리 및 불필요한 wrapper 제거

[sglang] flush_cache 단순화: 동시 요청 거부와 클라이언트 재시도 제거

들어가며

핵심 코드 분석

1. Deque에서 Optional로 변경

2. 클라이언트 측 재시도 로직 제거

3. 에러 메시지 개선

왜 이게 좋은가

정리

참고 자료

댓글

관련 포스트

PR Analysis 의 다른글