[SGLang] Debug Utils: 텐서 비교, 스케줄 시뮬레이터

2026년 4월 14일수정: 2026년 4월 14일

들어가며

LLM 추론 엔진의 디버깅은 일반 소프트웨어와 다른 도전을 안고 있다. 양자화, Tensor Parallel, 커널 변경 등이 출력에 미치는 미세한 차이를 감지하고, 스케줄링 전략의 영향을 시뮬레이션해야 한다. SGLang의 debug_utils/는 이를 위한 전문 도구들을 제공한다.

구조도

debug_utils/
├── comparator/                ── 텐서 비교 시스템
│   ├── entrypoint.py          ── CLI 진입점
│   ├── bundle_comparator.py   ── 번들 단위 비교
│   ├── bundle_matcher.py      ── 텐서 번들 매칭
│   ├── tensor_comparator/     ── 텐서별 비교 로직
│   ├── aligner/               ── 토큰 정렬기
│   ├── per_token_visualizer.py── 토큰별 히트맵
│   ├── dims_spec/             ── 차원 명세
│   └── preset.py              ── 사전 설정 프리셋
├── schedule_simulator/        ── 스케줄 시뮬레이터
│   ├── simulator.py           ── 시뮬레이션 엔진
│   ├── gpu_state.py           ── GPU 상태 모델
│   ├── request.py             ── 시뮬레이션 요청
│   ├── schedulers/            ── 스케줄러 정책
│   ├── routers/               ── 라우터 정책
│   └── metrics.py             ── 메트릭 기록
├── dumper.py                  ── 텐서 덤프 설정
├── dump_loader.py             ── 덤프 파일 로드
├── dump_comparator.py         ── 덤프 비교
├── log_parser.py              ── 로그 파서
├── model_truncator.py         ── 모델 잘라내기
└── text_comparator.py         ── 텍스트 비교

핵심 코드 분석

Comparator: 텐서 값 비교 시스템

Comparator는 두 실행 결과(baseline vs target)의 텐서 값을 레이어별, 토큰별로 비교한다. polars 데이터프레임을 사용하여 대규모 결과를 효율적으로 처리한다.

# entrypoint.py - 핵심 임포트와 구조
from sglang.srt.debug_utils.comparator.bundle_comparator import compare_bundle_pair
from sglang.srt.debug_utils.comparator.bundle_matcher import (
    TensorBundleInfo, match_bundles)
from sglang.srt.debug_utils.comparator.per_token_visualizer import (
    generate_per_token_heatmap)
from sglang.srt.debug_utils.comparator.preset import PRESETS, expand_preset

차원 명세 오버라이드를 통해 비교 시점에서 텐서 해석을 수정할 수 있다.

_DIMS_DEBUG_HINT = (
    "\nHint: If this is a dims annotation issue, do NOT re-run expensive dumps.\n"
    "Use --override-dims at comparison time, e.g.:\n"
    '  python -m sglang.srt.debug_utils.comparator '
    '--override-dims "tensor_name:b s h[tp] d"\n')

Dumper: 텐서 덤프 설정

dumper.py는 환경변수 기반 설정 시스템을 사용한다. _BaseConfig를 상속하여 타입 검증과 환경변수 파싱을 자동화한다.

@dataclass(frozen=True)
class _BaseConfig(ABC):
    def __post_init__(self) -> None:
        self._verify_types()

    @classmethod
    def from_env(cls) -> "_BaseConfig":
        return cls(**{
            f.name: cls._parse_env_field(cls._env_name(f.name), f.default)
            for f in fields(cls)
        })

    def with_defaults(self, **kwargs) -> "_BaseConfig":
        actual = {key: value for key, value in kwargs.items()
                  if os.getenv(cls._env_name(key)) is None}
        return replace(self, **actual) if actual else self

Schedule Simulator: 스케줄링 전략 시뮬레이션

Simulator 클래스는 GPU 상태를 모델링하고, 다양한 스케줄링 정책의 효과를 오프라인으로 시뮬레이션한다.

class Simulator:
    def __init__(self, num_gpus_per_engine, router, scheduler,
                 recorders=None, max_total_tokens=100000,
                 stop_criteria="all_done", max_steps=None):
        self.router = router
        self.scheduler = scheduler
        self.gpu_states: List[GPUState] = []

    def run(self, requests: List[SimRequest]) -> SimulationResult:
        self.gpu_states = [
            GPUState(gpu_id=i, max_total_tokens=self.max_total_tokens)
            for i in range(self.num_gpus_per_engine)]
        
        while True:
            self._route_requests(incoming_requests)
            self._schedule_all_gpus()
            if self._should_stop():
                break
            self._execute_step()
            step_records.extend(
                gpu.get_step_record(self.step) for gpu in self.gpu_states)
        return SimulationResult(step_records=step_records, summary=self._get_summary())

시뮬레이션 루프는 라우팅 -> 스케줄링 -> 실행 -> 기록의 순서로 진행된다.

def _should_stop(self) -> bool:
    if self.max_steps is not None and self.step >= self.max_steps:
        return True
    if self.stop_criteria == "all_done":
        return not any(
            gpu.pending_requests or gpu.running_requests
            for gpu in self.gpu_states)

# 두 덤프 디렉토리 비교
python -m sglang.srt.debug_utils.comparator \
    --baseline /path/to/dump_a \
    --target /path/to/dump_b

# 차원 오버라이드
python -m sglang.srt.debug_utils.comparator \
    --override-dims "attn_output:b s h[tp] d"

스케줄 시뮬레이션

python -m sglang.srt.debug_utils.schedule_simulator \
    --num-gpus 4 \
    --scheduler fcfs \
    --max-total-tokens 100000

도구 비교

도구	목적	입력	출력
Comparator	수치 정확성 검증	텐서 덤프 2개	차이 리포트, 히트맵
Simulator	스케줄링 분석	요청 리스트	Step 기록, 요약
Dumper	중간 텐서 저장	모델 실행	텐서 파일
Model Truncator	빠른 재현	모델 경로	잘린 모델

참고

소스 코드: python/sglang/srt/debug_utils/
Polars: 고성능 데이터프레임 라이브러리

SGLang 의 다른글

이전글 [SGLang] Observability: 추적, 메트릭, 프로파일링 인프라
현재글 : [SGLang] Debug Utils: 텐서 비교, 스케줄 시뮬레이터
다음글 [SGLang] Reasoning & Code Completion Parser: 추론 및 코드 파서

[SGLang] Debug Utils: 텐서 비교, 스케줄 시뮬레이터

들어가며

구조도

핵심 코드 분석

Comparator: 텐서 값 비교 시스템

Dumper: 텐서 덤프 설정

Schedule Simulator: 스케줄링 전략 시뮬레이션

토큰별 시각화

Model Truncator

사용 예시

텐서 비교 CLI

스케줄 시뮬레이션

도구 비교

관련 포스트

참고

댓글

관련 포스트

SGLang 의 다른글