[Triton] Gluon 검증 로직을 C++ verifier로 이동 — 차원 축소 로드 지원

2025년 12월 18일수정: 2025년 12월 18일

PR 링크: triton-lang/triton#9033 상태: Merged | 변경: +43 / -38

들어가며

Gluon DSL의 TMA(Tensor Memory Access) 연산에는 tensor descriptor의 레이아웃과 shared memory의 레이아웃이 일치해야 한다는 검증이 필요하다. 이전에는 이 검증이 Python assert로 구현되어 있었는데, 이 assert가 너무 엄격하여 dimension-reducing load(차원 축소 로드)를 차단하는 문제가 있었다.

이 PR은 검증 로직을 C++ verifier로 이동하여 더 정교한 검증을 수행한다.

핵심 코드 분석

Before: 너무 엄격한 Python assert

# Python 수준의 엄격한 검증
assert tensor_desc.layout == smem.layout
# dimension-reducing load에서는 tensor_desc와 smem의 rank가 다를 수 있어
# layout이 직접 비교 불가 → assert 실패

After: C++ verifier에서 정교한 검증

// C++ verifier — rank가 다른 경우를 올바르게 처리
LogicalResult AsyncTMACopyGlobalToLocalOp::verify() {
  auto descType = getDescPtr().getType();
  auto smemType = getResult().getType();

  // rank가 같은 경우: 직접 레이아웃 비교
  if (descType.getRank() == smemType.getRank()) {
    if (descType.getEncoding() != smemType.getEncoding())
      return emitError("layout mismatch");
  }
  // rank가 다른 경우(dimension-reducing): 호환성 검증
  else {
    if (!isCompatibleReducedLayout(descType, smemType))
      return emitError("incompatible reduced layout");
  }
  return success();
}

Python 코드에서는 assert를 제거하고 IR 생성에만 집중한다:

# After: assert 제거, C++ verifier에 위임
# Python은 IR 생성만 담당
builder.create_async_tma_copy_global_to_local(
    desc, coords, barrier, smem, pred, multicast)
# 레이아웃 검증은 C++ verifier가 IR 생성 후 자동 수행

왜 이게 좋은가

dimension-reducing load 지원: tensor descriptor의 rank(예: 3D)와 shared memory의 rank(예: 2D)가 다른 경우를 올바르게 처리한다.
검증 위치 개선: Python assert는 생성 시점에만 실행되지만, C++ verifier는 IR 변환 후에도 자동 실행되어 더 넓은 범위를 커버한다.
에러 메시지 개선: C++ emitError는 소스 위치 정보를 포함하여 디버깅이 용이하다.

정리

이 PR은 Gluon의 TMA 레이아웃 검증을 Python assert에서 C++ verifier로 이동하여, dimension-reducing load를 올바르게 지원하고 검증의 정확성과 범위를 향상시킨다.

참고 자료

이 글은 AI(Claude)의 도움을 받아 작성되었습니다. 핵심 코드와 explaination은 실제 PR diff를 기반으로 합니다.

PR Analysis 의 다른글

이전글 [Triton] Frontend에서 scaled batched matrix multiply 지원
현재글 : [Triton] Gluon 검증 로직을 C++ verifier로 이동 — 차원 축소 로드 지원
다음글 [Triton] ConSan에서 barrier 다중 도착 시 false positive deadlock 감지 수정