#TensorMemory

4개의 포스트

[triton] Gluon tmem_load에서 Register Layout 자동 추론

get_tmem_reg_layout 호출을 제거하고 tensor memory descriptor에서 register layout을 자동으로 추론하도록 BC-breaking 변경을 적용한 사례를 분석합니다.

#Triton #Gluon #NVIDIA #Blackwell #TensorMemory

2026년 2월 28일

[triton] FPSan에서 Warp Specialization + TMem 사용 시 크래시 수정

Floating-point Sanitizer가 WarpSpecialize 파티션 내에서 tensor memory 접근 시 scope 외부 값을 참조하여 발생하는 크래시를 수정한 사례를 분석합니다.

#Triton #FPSan #NVIDIA #WarpSpecialize #TensorMemory #BugFix

2026년 2월 9일

[Triton] M=64 2CTA 모드 지원 추가

Blackwell 아키텍처에서 M=64 instruction shape의 2CTA 모드를 지원하여 TensorMemory 레이아웃 유연성 확대

#Triton #NVIDIA #Blackwell #CTA #TensorMemory

2026년 1월 18일

[Triton] Gluon의 to_linear_layout에서 TensorMemory 레이아웃 지원

to_linear_layout 함수가 Distributed, Shared에 더해 TensorMemory 인코딩도 처리할 수 있도록 확장

#Triton #Gluon #NVIDIA #TensorMemory #LinearLayout

2025년 11월 21일