#TDM

10개의 포스트

[triton] AMD TDM의 Partition-Aware 분할 및 다중 Intrinsic 지원

PartitionedSharedEncoding에서 TDM warp 배분을 파티션 경계에 맞추고, 다중 TDM 명령어 생성 및 wait count 계산을 올바르게 처리하도록 개선한 사례를 분석합니다.

#Triton #AMD #GPU #TDM #WarpDistribution

2026년 3월 28일

[triton] AMD MXFP FA 예제에서 TDM Store 도입으로 Output 저장 최적화

buffer_store 기반의 수동 레이아웃 관리를 TDM store로 대체하여 코드를 단순화하고 메모리 접근 효율을 높인 사례를 분석합니다.

#Triton #AMD #GPU #TDM #FlashAttention

2026년 3월 23일

[triton] AMD Pipelined Loop에서 TDM Load의 Buffer Race 수정

AMD GPU의 pipelined loop에서 TDM load 사용 시 버퍼 수가 부족하여 발생하는 데이터 경쟁 버그를 수정한 PR 분석.

#Triton #AMD #TDM #Pipeline #BufferRace #BugFix

2026년 3월 14일

[triton] AMD GFX1250에서 TDM Software Pipelining 지원

AMD GFX1250 타겟에서 Tensor Descriptor Memory(TDM) 기반 비동기 복사를 software pipelining에 통합하여 matmul 성능을 향상시킨 PR을 분석합니다.

#Triton #AMD GPU #GFX1250 #TDM #Software Pipelining

2026년 2월 17일

[Triton] AMD TDM AsyncWait을 UpdateAsyncWaitCount에서 지원

TDM scatter/gather가 여러 intrinsic을 생성하는 경우의 정확한 waitcnt 계산 지원

#Triton #AMD #TDM #Async Wait #Compiler

2026년 2월 2일

[triton] AMD gfx1250 Gluon에 Tensor Async Gather(TDM) 지원 추가

AMD gfx1250 GPU의 TDM gather 모드를 활용하여 비연속 global memory 행에서 비동기적으로 데이터를 읽는 기능을 Gluon에 추가한 PR 분석.

#Triton #AMD #gfx1250 #Gluon #TDM #Gather

2026년 2월 1일

[triton] AMD gfx1250 Gluon에 Tensor Async Scatter 지원 추가

AMD gfx1250 GPU의 TDM scatter 모드를 활용하여 비연속 global memory 행에 비동기적으로 데이터를 쓰는 기능을 Gluon에 추가한 PR 분석.

#Triton #AMD #gfx1250 #Gluon #TDM #Scatter

2026년 1월 26일

[Triton] AMD TDM 기능 활성화 및 ConvertToTensorOps 패스 추가

TDM(Tensor Descriptor Memory) 관련 기능 활성화와 ConvertToTensorOps 변환 패스 추가

#Triton #AMD #TDM #Tensor Descriptor #Compiler Pass

2026년 1월 23일

[Triton] AMD TDM 연산에 multi-CTA 및 multicast 지원 추가

CGALayout 기반으로 TDM load/store에 멀티캐스트 마스크를 자동 설정하여 cluster 간 데이터 공유 가능

#Triton #AMD #TDM #Multi-CTA #Multicast

2025년 11월 24일

[Triton] gfx1250에서 TDM Store 지원 추가

AMD gfx1250 타겟에서 Tensor Data Mover를 통한 shared-to-global 비동기 store 연산 구현

#Triton #AMD #gfx1250 #TDM #Async

2025년 10월 9일