[Triton] Gluon의 to_linear_layout에서 TensorMemory 레이아웃 지원

2025년 11월 21일수정: 2025년 11월 21일

PR 링크: triton-lang/triton#8682 상태: Merged | 변경: +100 / -47

들어가며

Triton의 Gluon 프론트엔드는 to_linear_layout 함수를 통해 다양한 인코딩을 LinearLayout으로 변환한다. 기존에는 DistributedLayout과 SharedLayout만 지원했는데, 이 PR은 NVIDIA Blackwell의 TensorMemory 인코딩도 처리할 수 있도록 확장한다. 이는 TMEM 관련 디버깅과 레이아웃 출력에 필수적이다.

핵심 코드 분석

C++ 레이어 확장 (gluon_ir.cc)

Before:

auto linearLayout = ttg::toLinearLayout(shape, layout);
auto attr = ttg::LinearEncodingAttr::get(ctx, linearLayout);
return layoutToGluon(attr);

After:

auto linearLayout = ttg::toLinearLayout(shape, layout);

if (isa<ttg::DistributedEncodingTrait>(layout)) {
  auto attr = ttg::LinearEncodingAttr::get(ctx, linearLayout);
  return layoutToGluon(attr);
}
if (isa<ttg::SharedEncodingTrait>(layout)) {
  auto alignment = cast<ttg::SharedEncodingTrait>(layout).getAlignment();
  auto attr = ttg::SharedLinearEncodingAttr::get(ctx, linearLayout, alignment);
  return layoutToGluon(attr);
}
// TensorMemory: wrap as print-only Python object

인코딩 타입에 따라 적절한 LinearLayout 래퍼를 생성한다. TensorMemory의 경우 Python 측의 _TensorMemoryLinearLayout 객체로 감싼다.

Python semantic 확장

_check(
    isinstance(layout, (DistributedLayout, SharedLayout,
                        TensorMemoryLayout, TensorMemoryScalesLayout)),
    lambda: f"Expected a supported layout type, got {type(layout)}"
)