[Gradio] 백엔드 프로파일링 및 벤치마크 인프라 구축

2026년 3월 24일수정: 2026년 3월 24일

PR 링크: gradio-app/gradio#13032 상태: Merged | 변경: +1400 / -43

들어가며

Gradio 서버의 성능 병목을 파악하려면 요청 처리 과정의 각 단계(preprocess, 함수 호출, postprocess, streaming diff)가 각각 얼마나 걸리는지 알아야 한다. 기존에는 이를 측정할 수 있는 내장 도구가 없어서 개발자가 직접 time.monotonic()을 곳곳에 삽입해야 했다. 이 PR은 환경 변수 하나로 켜고 끌 수 있는 프로파일링 모듈(gradio.profiling)과 다양한 시나리오를 커버하는 벤치마크 앱을 도입한다.

핵심 코드 분석

1. RequestTrace 데이터 구조

각 요청의 처리 단계별 소요 시간을 기록하는 dataclass다.

@dataclass
class RequestTrace:
    event_id: str | None = None
    fn_name: str | None = None
    session_hash: str | None = None
    timestamp: float = field(default_factory=time.time)

    queue_wait_ms: float = 0.0
    preprocess_ms: float = 0.0
    fn_call_ms: float = 0.0
    postprocess_ms: float = 0.0
    streaming_diff_ms: float = 0.0
    total_ms: float = 0.0
    n_iterations: int = 0

각 단계(preprocess, fn_call, postprocess, streaming_diff)의 시간을 밀리초 단위로 누적한다. Generator 함수의 경우 n_iterations로 반복 횟수를 추적한다.

2. trace_phase Context Manager

Before (프로파일링 없음):

inputs = await self.preprocess_data(
    block_fn, inputs, state, explicit_call
)
result = await self.call_function(
    block_fn, inputs, old_iterator, request,
    event_id, event_data, in_event_listener, state,
)
data = await self.postprocess_data(block_fn, result["prediction"], state)

After (프로파일링 적용):

from gradio.profiling import trace_phase

async with trace_phase("preprocess"):
    inputs = await self.preprocess_data(
        block_fn, inputs, state, explicit_call
    )
async with trace_phase("fn_call"):
    result = await self.call_function(
        block_fn, inputs, old_iterator, request,
        event_id, event_data, in_event_listener, state,
    )
async with trace_phase("postprocess"):
    data = await self.postprocess_data(
        block_fn, result["prediction"], state
    )

trace_phase는 async context manager로, 현재 스레드의 RequestTrace에 해당 단계의 소요 시간을 자동으로 기록한다.

3. Zero-overhead 비활성화

프로파일링이 꺼져 있을 때는 no-op으로 대체되어 성능 영향이 전혀 없다.

PROFILING_ENABLED = os.environ.get("GRADIO_PROFILING", "").strip() in ("1", "true")

if not PROFILING_ENABLED:
    @asynccontextmanager
    async def trace_phase(name: str):  # noqa: ARG001
        yield

환경 변수 GRADIO_PROFILING이 설정되지 않으면 trace_phase는 아무것도 하지 않는 빈 context manager가 된다.

4. HTTP 엔드포인트

프로파일링이 활성화되면 3개의 REST 엔드포인트가 자동 등록된다.

if PROFILING_ENABLED:
    @router.get("/profiling/traces")
    async def profiling_traces(last_n: int | None = None):
        return ORJSONResponse(collector.get_all(last_n=last_n))

    @router.get("/profiling/summary")
    async def profiling_summary():
        return ORJSONResponse(collector.get_summary())

    @router.post("/profiling/clear")
    async def profiling_clear():
        collector.clear()
        return ORJSONResponse({"status": "cleared"})

/profiling/summary는 p50, p90, p95, p99 백분위수를 포함한 통계 요약을 반환한다.

5. 큐 대기 시간 추적

큐 이벤트 생성 시 enqueue_time을 기록하고, 처리 시작 시 대기 시간을 계산한다.

Before:

class EventQueue:
    def __init__(self, ...):
        self.closed = False
        self.n_calls = 0
        self.run_time: float = 0
        self.signal = asyncio.Event()

After:

class EventQueue:
    def __init__(self, ...):
        self.closed = False
        self.n_calls = 0
        self.run_time: float = 0
        self.enqueue_time: float = time.monotonic()  # 추가
        self.signal = asyncio.Event()

왜 이게 좋은가

제로 오버헤드: 프로파일링 비활성화 시 no-op으로 교체되어 프로덕션 성능에 영향 없음
ContextVar 기반: asyncio 환경에서 요청 간 trace가 섞이지 않음
벤치마크 앱 포함: echo_text(순수 오버헤드), file_heavy(파일 I/O), llm_chat(스트리밍), stateful_counter(상태 관리) 등 다양한 시나리오 커버
통계 요약: numpy 기반 p50/p90/p95/p99 백분위수로 실질적인 성능 지표 제공

정리

이 PR은 Gradio 서버의 관측성(observability) 기반을 마련한다. 환경 변수 하나로 활성화하고, REST API로 결과를 수집할 수 있어 CI/CD 파이프라인에 통합하기도 쉽다. 특히 MCP 호출 최적화(#12296, #12961)의 효과를 정량적으로 검증하는 데 이 도구가 핵심적인 역할을 했을 것이다.

참고 자료

gradio-app/gradio#13032 — 원본 PR
Python contextvars — ContextVar 공식 문서

⚠️ 알림: 이 분석은 AI가 실제 코드 diff를 기반으로 작성했습니다.

PR Analysis 의 다른글

이전글 [CPython] JIT float 연산 최적화 — 유일 참조 피연산자 재사용
현재글 : [Gradio] 백엔드 프로파일링 및 벤치마크 인프라 구축
다음글 [vllm] Thinking Token Hard Limit - 추론 토큰 수 제한으로 리소스 제어