최신 포스트

[논문리뷰] Demystifing Video Reasoning

최근 Diffusion 기반 비디오 생성 모델이 Spatiotemporally Consistent한 시각 환경에서 비 trivial한 Reasoning 능력을 보이는 현상이 발견되었습니다.

#Review #Video Reasoning #Diffusion Models #Chain-of-Steps #Emergent Behaviors #Layer Specialization #Training-Free Ensemble

2026년 3월 17일

[논문리뷰] AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

LLMs가 tool-using agent로 발전하면서 외부 환경과 상호작용하는 능력은 크게 향상되었지만, long-horizon 상호작용에서는 여전히 취약합니다.

#Review #Large language models #Process reward models #Tool-using agents #Step-level evaluation #Agent trajectories #Benchmark

2026년 3월 17일

[Triton] GFX1250용 MXGEMM Gluon 커널 업데이트

2026년 3월 18일

[Triton] MXFP Flash Attention 예제에 LDS reduction 적용

2026년 3월 18일

[Triton] translator_helpers에서 builtin 사용 정리

2026년 3월 18일

[triton] AMD gfx1250에서 Async Copy와 TDM 경로의 Padded Layout 휴리스틱 통합

AMD gfx1250 GPU의 async copy와 TDM 로드 경로에서 사용되는 padded shared memory layout 선택 휴리스틱을 통합한 PR 분석.

#Triton #AMD #gfx1250 #SharedMemory #Padding #BankConflict

2026년 3월 17일

[Ray Data] RAPIDS MPF 기반 GPU 셔플 지원으로 GPU 데이터 처리 파이프라인 가속

CPU 경유 없이 GPU 메모리에서 직접 해시 셔플을 수행하는 GPUShuffleOperator를 추가하여 대규모 분산 GPU 데이터 처리를 가속하는 기능 분석.

#Ray #Python #Performance #GPU #Distributed Systems

2026년 3월 17일

[axolotl] Async GRPO 지원: vLLM 비동기 생성과 Importance Sampling으로 RLHF 학습 가속화

axolotl에 Async GRPO를 도입하여 vLLM 생성과 학습을 병렬화하고, Importance Sampling 보정으로 분포 이동 문제를 해결한 대규모 기능 추가를 분석합니다.

#Axolotl #GRPO #RLHF #vLLM #Async Training #LoRA

2026년 3월 17일

[llm-compressor] Intermediates Cache Prefetch - 중간 결과 프리페칭

양자화 캘리브레이션의 중간 결과를 프리페칭하여 레이어 순차 처리의 대기 시간을 감소

#llm-compressor #Performance

2026년 3월 17일

[ray] Ray Serve P99 레이턴시 회귀 수정 — 큐 길이 캐시 미감소 버그

큐 길이 캐시가 증가만 하고 감소하지 않아 발생한 P99 레이턴시 회귀를 수정

#Python #Ray Serve #Performance #Bug Fix #Distributed Systems

2026년 3월 17일

[pytest] request.getfixturevalue()의 dirty optimization 제거

동적으로 요청한 fixture를 arg2fixturedefs에 추가하던 불필요한 최적화를 제거하고 Mapping 타입으로 변경

#Python #pytest #Fixtures #Refactoring #Code Quality

2026년 3월 17일

[axolotl] transformers 5.3.0 / TRL 0.29.0 업그레이드: API 변경 대응과 deprecated 설정 처리

transformers 5.3.0과 TRL 0.29.0으로의 메이저 의존성 업그레이드에서 발생하는 breaking change를 체계적으로 처리한 사례를 분석합니다.

#Axolotl #Transformers #TRL #Dependency Upgrade #Migration

2026년 3월 16일

[Ultralytics] TensorRT 문서에서 더 이상 유효하지 않은 INT8 배치 2배 참조 제거

INT8 캘리브레이션 시 배치 크기를 자동으로 2배로 늘리던 동작이 제거된 후, 관련 문서를 업데이트합니다.

#Ultralytics #TensorRT #INT8 #Quantization #Documentation

2026년 3월 16일

[triton] Fork된 서브프로세스에서 간헐적 SIGABRT 충돌 수정

LLVM의 내부 병렬 처리가 fork-safe하지 않아 발생하는 간헐적 SIGABRT를 LLVM 스레드 풀 비활성화로 해결한 PR 분석.

#Triton #LLVM #Fork #SIGABRT #Threading #BugFix

2026년 3월 16일

[triton] AMD GFX1250에서 Buffer Atomic 연산 활성화

GFX1250 아키텍처에서 buffer atomic RMW/CAS 지원을 추가하고, SCOPE_DEV cache policy와 packed bf16 fadd를 구현한 사례를 분석합니다.

#Triton #AMD #GPU #GFX1250 #Atomics

2026년 3월 16일

[triton] Consumer Blackwell(sm_120)에서 PTX Codegen Segfault 수정

RTX 5070 Ti 등 consumer Blackwell GPU에서 sm_120a suffix 사용으로 인한 런타임 segfault를 수정한 사례를 분석합니다.

#Triton #NVIDIA #GPU #Blackwell #PTX #BugFix

2026년 3월 16일

[Ray Core] OOM Killer에서 대용량 메모리를 점유한 유휴 워커를 우선 종료

메모리 부족 시 태스크가 할당된 워커만 종료하던 OOM Killer를 개선하여, 유휴 상태에서 대량 메모리를 점유하는 워커를 우선 종료하도록 변경한 분석.

#Ray #C++#Performance #OOM #Memory Management

2026년 3월 16일

[axolotl] FSDP CPU RAM Efficient Loading 패치: non-rank-0 프로세스의 불필요한 가중치 초기화 방지

FSDP 분산 학습에서 cpu_ram_efficient_loading 사용 시 non-rank-0 프로세스가 가중치를 재초기화하는 문제를 monkeypatch로 해결한 사례를 분석합니다.

#Axolotl #FSDP #Distributed Training #Memory Optimization #Monkeypatch

2026년 3월 16일

[vllm] FlashInfer MoE A2A Kernel - NVLink 기반 Expert Parallelism 통신

FlashInfer의 NVLink two-sided/one-sided All-to-All 커널을 통합하여 MoE 모델의 expert parallel 통신 가속

#vllm #Performance

2026년 3월 16일

[논문리뷰] daVinci-Env: Open SWE Environment Synthesis at Scale

Large Language Models (LLMs)의 발전은 자율적인 Software Engineering (SWE) agent 개발을 가속화하고 있지만, 이러한 agent를 효과적으로 훈련하기 위해서는 대규모의 실행 가능하며 검증 가능한 환경이 필수적입니다.

#Review #SWE Agents #Environment Synthesis #Large Language Models #Dockerfile #SWE-Bench Verified #Data Scaling #Quality Curation

2026년 3월 15일