[triton] Backend별 global_scratch_alloc 할당 통합

2026년 2월 26일수정: 2026년 2월 26일

PR 링크: triton-lang/triton#9536 상태: Merged | 변경: +197 / -317

들어가며

Triton의 global_scratch_alloc은 커널 내에서 global memory 버퍼를 할당하는 연산입니다. 기존에는 backend 문자열 속성으로 할당 방식을 구분했는데, 이 PR은 이를 third_party_allocation 단위 속성(unit attribute)으로 단순화하고, Proton 프로파일러의 scratch 메모리를 별도의 profile_scratch_memory 풀로 분리합니다.

핵심 코드 분석

Before:

def TTG_GlobalScratchAllocOp : TTG_Op<"global_scratch_alloc"> {
  let arguments = (ins
    I32Attr:$nbytes,
    I32Attr:$alignment,
    DefaultValuedAttr<StrAttr, "\"default\"">:$backend
  );
};

After:

def TTG_GlobalScratchAllocOp : TTG_Op<"global_scratch_alloc"> {
  let arguments = (ins
    I32Attr:$nbytes,
    I32Attr:$alignment,
    OptionalAttr<UnitAttr>:$third_party_allocation
  );
};

메모리 할당에서 두 풀을 독립적으로 관리:

ScratchMemoryInfo globalMemInfo;
ScratchMemoryInfo profileMemInfo;

if (auto alloc = dyn_cast<triton::gpu::GlobalScratchAllocOp>(op)) {
    bool isThirdPartyAlloc = alloc->hasAttr("third_party_allocation");
    ScratchMemoryInfo &memInfo =
        isThirdPartyAlloc ? profileMemInfo : globalMemInfo;
    assignOffset(op, builder, memInfo, alloc.getNbytes(),
                 alloc.getAlignment(), "ttg.global_scratch_memory_offset");
}

왜 이게 좋은가

기존의 backend 문자열은 open-ended하여 어떤 값이든 올 수 있었고, 비교 로직이 취약했습니다. third_party_allocation 단위 속성은 있거나 없거나 두 상태만 가능하여 타입 안전합니다. Profile scratch를 별도 풀로 분리한 것은 프로파일링 오버헤드를 커널의 실제 scratch 사용량에서 분리하여, 프로파일링 유무에 따라 커널 동작이 변하지 않도록 보장합니다. CallOp에서도 두 offset을 독립적으로 전달합니다.

정리

global_scratch_alloc의 backend 문자열을 third_party_allocation 단위 속성으로 교체하고, profile scratch 메모리를 별도 풀로 분리하여 할당 관리를 통합했습니다.

참고 자료

triton-lang/triton#9536

이 분석은 AI가 실제 코드 diff를 기반으로 작성했습니다.

PR Analysis 의 다른글

이전글 [Ray Serve] Direct Ingress 최적화: 상수 순서 정리 및 빈 프록시 조기 반환
현재글 : [triton] Backend별 global_scratch_alloc 할당 통합
다음글 [Open WebUI] get_tools()에서 빈 tool_ids 조기 반환 최적화