[Open WebUI] 이중 RAF 제거로 스트리밍 표시 지연 32ms에서 16ms로 단축

2026년 3월 25일수정: 2026년 3월 25일

PR 링크: open-webui/open-webui#23016 상태: Merged | 변경: +1 / -16

들어가며

성능 최적화를 시도하다가 기존 최적화와 중복되어 오히려 지연이 늘어나는 경우가 있습니다. 이전 PR(#22947)에서 scheduleHistoryFlush라는 requestAnimationFrame(RAF) 기반 쓰로틀을 추가했지만, Messages.svelte의 pendingRebuild가 이미 같은 역할을 하고 있었습니다.

핵심 코드 분석

이중 RAF 체인 제거

문제가 된 데이터 흐름:

token → scheduleHistoryFlush (RAF 1, ~16ms)
  → pendingRebuild (RAF 2, ~16ms)
    → buildMessages()

Before:

if (type === 'chat:message:delta' || type === 'message' || type === 'status') {
    scheduleHistoryFlush();
} else {
    cancelAnimationFrame(historyRAF);
    historyRAF = null;
    history.messages[event.message_id] = message;
}

let historyRAF = null;
const scheduleHistoryFlush = () => {
    if (!historyRAF) {
        historyRAF = requestAnimationFrame(() => {
            historyRAF = null;
            history = history;
        });
    }
};

After:

history.messages[event.message_id] = message;

scheduleHistoryFlush와 관련된 15줄의 코드가 모두 제거되고, 단순한 직접 할당으로 복원되었습니다. Svelte의 반응성 시스템이 history 변경을 감지하면 pendingRebuild가 다음 프레임에서 buildMessages()를 호출합니다.

왜 이게 좋은가

스트리밍 토큰이 화면에 표시되기까지의 지연이 약 32ms(RAF 2회)에서 약 16ms(RAF 1회)로 감소합니다
빠른 모델에서 초당 수십 개의 토큰이 들어올 때 체감되는 부드러움이 개선됩니다
불필요한 상태 관리 변수(historyRAF)와 함수(scheduleHistoryFlush)가 제거되어 코드가 단순해졌습니다
이미 동작하는 쓰로틀링 메커니즘을 파악하지 못한 채 중복 최적화를 추가한 실수를 잘 보여주는 사례입니다

참고 자료

PR Analysis 의 다른글

이전글 [Axolotl] 플러그인에 scored rollout 디스패치, 외부 플러그인 경로 확장, vLLM 에러 처리 개선
현재글 : [Open WebUI] 이중 RAF 제거로 스트리밍 표시 지연 32ms에서 16ms로 단축
다음글 [SGLang] wait-for-jobs에 ETag conditional request 도입으로 API rate limit 절약