PR Analysis

[uvloop] _ready_len 레이스 컨디션 수정

수동 관리하던 _ready_len 카운터를 len(self._ready) 직접 호출로 대체하여 race condition을 제거한다

#uvloop #Race Condition #Event Loop #Cython

2026년 1월 19일

[llm-compressor] Memoryless Observers - 메모리 효율적 가중치 관찰자

양자화 캘리브레이션의 가중치 관찰자를 memoryless 방식으로 전환하여 메모리 사용량 대폭 감소

#llm-compressor #Performance

2026년 1월 19일

[Triton] M=64 2CTA 모드 지원 추가

Blackwell 아키텍처에서 M=64 instruction shape의 2CTA 모드를 지원하여 TensorMemory 레이아웃 유연성 확대

#Triton #NVIDIA #Blackwell #CTA #TensorMemory

2026년 1월 18일

[triton] [Blackwell] NVIDIA 차세대 아키텍처를 위한 Triton의 tcgen05.ld.red 최적화 분석

Blackwell 아키텍처의 TMEM 로드 및 리덕션 동시 수행 기능을 Triton Gluon에 구현하여 성능을 최적화한 사례를 분석합니다.

#Triton #Blackwell #NVIDIA #GPU #Optimization #MLIR

2026년 1월 16일

[Loki] memory 서브패키지 통합으로 코드 구조 개선

memory/bitmap, memory/buffer를 memory 패키지로 통합하여 중복 제거

#Grafana Loki #Go #Refactoring #Performance

2026년 1월 16일

[Ray Serve] 레플리카 라우팅 데이터 구조 최적화: O(n) 스캔을 O(1) 딕셔너리 룩업으로 교체

Ray Serve의 요청 라우터에서 O(n) 선형 스캔을 O(1) 딕셔너리 인덱스로 교체하고, 해시 캐싱과 메트릭 쓰로틀링을 추가한 종합 최적화 분석.

#Ray #Python #Performance #Data Structures #Serving

2026년 1월 16일

[Triton] TritonGPU Barrier 재설계 — 주소 공간별 메모리 가시성 보장

gpu.barrier를 TritonGPU 전용 barrier op으로 교체하여 shared/global 메모리 가시성을 세밀하게 제어한다

#Triton #MLIR #GPU Barrier #Memory Visibility #Compiler Infrastructure

2026년 1월 16일

[triton] Warp Specialization: 데이터 플로우 그래프 기반의 개선된 파티션 스케줄링 패스

기존 파티션 스케줄링을 데이터 플로우 그래프와 incremental heuristic merging 기반으로 재작성하여 범용성을 높인 분석.

#Triton #Warp Specialization #Partition Scheduling #Data Flow Graph #Compiler #MLIR

2026년 1월 16일

[Loki] Delta Decoder 최적화로 3배 처리량 개선

streamio.Reader 인터페이스 제거와 직접 바이트 슬라이스 접근으로 delta decoder 성능 60% 향상.

#Grafana Loki #Go #Performance #Encoding #Data Pipeline

2026년 1월 15일

[triton] moveUpTranspose 최적화 제거 PR의 Revert - 회귀 방지

일부 워크로드에서 성능 회귀를 유발한 moveUpTranspose 제거를 되돌려, TransposeOp 재배치 최적화를 복원한 PR을 분석합니다.

#Triton #AMD #Revert #Performance #Regression

2026년 1월 15일

[Loki] 데이터 오브젝트 Plain Value 디코더 최적화로 처리량 93% 향상

Grafana Loki의 dataobj에서 Plain Value 디코더를 Arrow 스타일 메모리 표현, []byte 기반 디코딩, 포인터 간접 참조 최소화로 재작성하여 디코딩 처리량을 93% 향상시킨 최적화를 분석합니다.

#Grafana Loki #Go #Performance #Decoder #Memory Optimization #Benchmark

2026년 1월 15일

[Triton] AMD fine-grained cluster barrier 추가 및 Gluon 노출

CTA 간 실행 동기화를 위한 cluster barrier arrive/wait 연산을 AMD 백엔드에 추가

#Triton #AMD #Gluon #Multi-CTA #Synchronization

2026년 1월 15일

[uvloop] deprecated asyncio.iscoroutinefunction을 inspect 모듈로 교체

Python 3.12에서 deprecated된 asyncio.iscoroutinefunction을 inspect.iscoroutinefunction으로 마이그레이션합니다.

#uvloop #Python #asyncio #Deprecation #Migration

2026년 1월 14일

[Triton] Proton에서 선택적 커널 메타데이터 기록 및 커스텀 메트릭 지원

LaunchHook에 include/exclude 필터와 임의 메트릭 지원을 추가하여 프로파일링 유연성 향상

#Triton #Proton #Profiler #Metadata #Performance

2026년 1월 15일

[Loki] Plain 디코더 벤치마크 추가 및 코드 개선

Loki dataobj의 plain bytes 디코더에 체계적인 벤치마크를 추가하고, 불필요한 조건 체크를 제거하여 디코딩 성능을 개선한 PR 분석.

#Grafana Loki #Go #Benchmarking #Decoder #Data Object #Performance

2026년 1월 14일

[Grafana Loki] pkg/dataobj를 위한 실험적 arena 스타일 메모리 패키지 도입

메모리 영역을 회수하고 재사용할 수 있는 arena 스타일 Allocator와 비트맵/버퍼 유틸리티를 새로 도입한 분석.

#Grafana Loki #Go #Memory Management #Arena Allocator #Performance #Bitmap

2026년 1월 14일

[Grafana Loki] 델타 디코더 벤치마크 개선 및 Decode 메서드 성능 측정 추가

단일 값 decode 벤치마크를 배치 단위 Decode 메서드 벤치마크로 재작성하고, 처리량 메트릭과 errors.Is 최적화를 추가한 분석.

#Grafana Loki #Go #Performance #Benchmark #Encoding

2026년 1월 14일

[triton] AMD: padded shared layout을 더 작은 block size에도 적용하여 bank conflict 제거

16KB 미만의 작은 블록에서도 LDS padding을 활용한 bank conflict 프리 레이아웃을 지원하도록 개선한 변경 분석.

#Triton #AMD #GPU #LDS #Bank Conflict #Shared Memory

2026년 1월 13일

[pytorch] CI: Inductor 테스트에 IoU 기반 accuracy 체크를 추가하여 segmentation 모델 안정화

PyTorch Inductor 벤치마크에서 segmentation 모델의 boolean mask 출력에 IoU(Intersection over Union) 메트릭을 적용하여, 부동소수점 차이로 인한 false failure를 방지한 사례를 분석합니다.

#PyTorch #Inductor #Benchmarks #IoU #Segmentation #Accuracy #CI

2026년 1월 12일

[Triton] ReduceOp 로우어링을 LinearLayout 기반으로 개선 및 단순화

ReduceOp 로우어링을 LinearLayout 기반으로 재설계하여 shmem swizzling 활용, 불필요한 round-trip 제거

#Triton #MLIR #Compiler Optimization #LinearLayout #Refactoring

2026년 1월 12일