최신 포스트

[Triton] tcgen05 op의 cga_layout barrier 검증 강화

2026년 3월 24일

[Triton] GSan(Global Sanitizer) warp specialized 커널에서의 deadlock 수정

2026년 3월 24일

[Triton] matmul 커널에 nvfp4 x nvfp4, mxfp4 x mxfp4 지원 추가

2026년 3월 24일

[triton] AMD MXFP FA 예제에서 TDM Store 도입으로 Output 저장 최적화

buffer_store 기반의 수동 레이아웃 관리를 TDM store로 대체하여 코드를 단순화하고 메모리 접근 효율을 높인 사례를 분석합니다.

#Triton #AMD #GPU #TDM #FlashAttention

2026년 3월 23일

[sglang] HiSparse 도입: Sparse Attention 모델을 위한 효율적인 KV 캐시 관리

HiSparse는 CPU 메모리를 활용해 유휴 KV 캐시를 저장함으로써, DeepSeek-V3와 같은 Sparse Attention 모델의 배치 사이즈와 처리량을 극대화합니다.

#SGLang #LLM #KV Cache #Sparse Attention #CUDA

2026년 3월 23일

[Ray] Actor Pool Map Operator 스케줄러 오버헤드 57% 감소

Ray Data의 actor pool 스케줄러에서 protobuf enum 캐싱, dict lookup 최소화, 상수 호이스팅으로 500+ 액터 환경에서 57% 성능 개선을 달성한 PR 분석.

#Ray #Ray Data #Actor Pool #Python Optimization #Protobuf #Performance

2026년 3월 23일

[vllm] ViT Full CUDA Graph - 비전 인코더 CUDA Graph 완전 지원

EncoderCudaGraphManager를 도입하여 ViT 인코더의 CUDA Graph 캡처/리플레이를 구현, 비전 모델 추론 가속

#vllm #Performance

2026년 3월 23일

[Ultralytics] detect/obb Loss 계산의 preprocess를 벡터화하여 학습 속도 향상

배치별 for 루프를 scatter_add 기반 벡터 연산으로 대체하여 detect/obb Loss의 preprocess 단계를 가속합니다.

#Ultralytics #YOLO #PyTorch #Vectorization #Performance

2026년 3월 22일

[논문리뷰] s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

본 논문은 LLM이 산업용 cryptographic assembly 코드를 검증할 수 있는지를 평가하기 위해, AWS의 s2n-bignum 라이브러리에서 추출한 형식 명세와 HOL Light 증명 스크립트 생성 과제를 묶은 실용 벤치마크를 제안합니다.

#Review #Formal Verification #Theorem Proving #HOL Light #LLM for Code #Cryptographic Assembly #Neurosymbolic AI

2026년 3월 22일

[논문리뷰] WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

최근 2D Foundation Models는 Text-to-Image Diffusion을 통해 탁월한 High-fidelity 이미지 생성 능력과 깊은 Semantic Understanding을 보여주었습니다.

#Review #3D World Generation #Foundation Models #Multi-Agent System #Vision-Language Models #3D Consistency #Gaussian Splatting

2026년 3월 22일

[논문리뷰] Versatile Editing of Video Content, Actions, and Dynamics without Training

최근 generative video models의 발전에도 불구하고, 실제 비디오에서 액션이나 dynamic event를 편집하거나, 삽입된 content가 다른 객체의 행동에 영향을 미치도록 하는 non-rigid, dynamic manipulation은 여전히 큰 도전 과제입니다.

#Review #Video Editing #Training-Free #Inversion-Free #Rectified Flow Models #Similarity Guided Aggregation (SGA)#Annealed Noise Correlation (ANC)#Text-to-Video Flow Models #Dynamic Manipulation

2026년 3월 22일

[논문리뷰] TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Earth Observation (EO) 분야에서 Vision-Language Models (VLMs)의 가능성에도 불구하고, 기존 VLM들은 픽셀 수준의 정밀한 공간 추론 과 다중 센서 및 시간 경과 데이터 통합 에 어려움을 겪고 있습니다.

#Review #Vision-Language Models (VLMs)#Earth Observation (EO)#Pixel-Grounded Reasoning #Chain-of-Thought (CoT)#Multi-Modal Reasoning #Multi-Temporal Reasoning #Geospatial Reasoning

2026년 3월 22일

[논문리뷰] TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Untextured 3D 모델에 대해 사진처럼 사실적이고 자체 일관성(self-consistent) 있는 외관을 자동으로 생성하는 것은 디지털 콘텐츠 제작 분야에서 중요한 도전 과제입니다.

#Review #Video Generation #3D Texturing #Geometric Consistency #Turntable Video #Diffusion Models #Neural Rendering

2026년 3월 22일

[논문리뷰] ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

본 논문은 MLLM이 어려운 시각 태스크에서 사용자에게 단순한 도움을 먼저 요청할 수 있는 'Proactiveness'를 갖췄는지 평가하기 위해 7개 데이터셋을 재구성한 ProactiveBench를 제안하고, 22개 MLLM을 분석합니다.

#Review #MLLM #Benchmark #Proactiveness #Reinforcement Learning #Multimodal Reasoning #Human-AI Interaction

2026년 3월 22일

[논문리뷰] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

최근 Diffusion Model의 발전으로 Text-to-Video 생성 능력이 크게 향상되어, 전경(foreground)과 배경(background) 요소에 대한 fine-grained control을 통해 개인화된 콘텐츠 생성이 가능해졌습니다.

#Review #Personalized Video Generation #Multi-Subject #Face-Attribute Alignment #Diffusion Models #Attention Mechanisms #Relational Embedding #Text-to-Video

2026년 3월 22일

[논문리뷰] LoopRPT: Reinforcement Pre-Training for Looped Language Models

최신 Large Language Models (LLMs)는 CoT(Chain-of-Thought) 프롬프팅과 같이 명시적인 텍스트 생성(explicit text generation)을 통해 '생각'하는 방식으로 훈련됩니다.

2026년 3월 22일

[논문리뷰] Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

최근 Large Language Models (LLMs)은 뛰어난 일반 지능과 추론 능력을 보여주지만, 다국어 성능에서는 심각한 불균형을 보입니다.

#Review #LLMs #Multilinguality #Encoder-Decoder #Optimal Transport #Cross-Model Mapping #Language-on-Demand #NMT

2026년 3월 22일

[논문리뷰] Hyperagents

기존 Self-improving AI 시스템은 대부분 고정된 Meta agent에 의존하여 Self-improvement 메커니즘의 개선에 근본적인 한계가 있었습니다.

#Review #Hyperagents #Metacognitive Self-modification #Self-improving AI #Open-ended Exploration #Darwin Gödel Machine #Meta-learning #Robotics Reward Design #Olympiad-level Math Grading

2026년 3월 22일

[논문리뷰] How Well Does Generative Recommendation Generalize?

Generative Recommendation (GR) 모델은 기존 Item ID-based 모델 대비 우수한 성능을 보이며 sequential recommendation 분야에서 유망한 패러다임으로 부상했다.

2026년 3월 22일

[논문리뷰] HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Vision-language Models (VLMs)는 fine-grained하고 multi-step의 복잡한 시각-언어 추론 Task에서 여전히 어려움을 겪고 있다.

#Review #Vision-Language Models #Multi-Hop Reasoning #Data Synthesis #Reinforcement Learning with Verifiable Rewards #Chain-of-Thought #Generalizable Reasoning #Perception-level Hops #Instance-chain Hops

2026년 3월 22일