최신 포스트

[SGLang] Diffusion JIT 커널 테스트 레이아웃 리팩터링 및 CI 트리거 정밀화

JIT 커널 테스트/벤치마크를 diffusion/ 서브폴더로 이동하고 CI 트리거를 관련 경로에만 반응하도록 좁힌다

#SGLang #CI/CD #Testing #Refactoring

2026년 3월 26일

[triton] AMD 백엔드에 Concurrency Sanitizer(ConSan) 지원 추가

AMD GPU에서 GPU 동시성 버그를 감지하는 ConSan을 지원하기 위해 MBarrierOpInterface, 타겟 훅, 캡처 카운트 추정 등을 구현한 사례를 분석합니다.

#Triton #AMD #GPU #ConSan #Sanitizer #Concurrency

2026년 3월 26일

[sglang] SGLang의 FA3 디코드 최적화: get_scheduler_metadata 도입

FlashAttention-3의 타일 스케줄링 메타데이터를 사전 계산하여 레이어별 오버헤드를 제거하는 최적화 기법을 분석합니다.

#SGLang #FlashAttention #CUDA #Optimization #LLM

2026년 3월 25일

[triton] Triton AMD 백엔드 최적화: SGPR 활용과 루프 최적화를 통한 GEMM 성능 향상

Triton의 AMD GPU 커널에서 VGPR 의존성을 제거하고 루프 분기 최적화를 통해 성능을 개선한 사례를 분석합니다.

#Triton #AMD #GPU #Optimization #GEMM

2026년 3월 25일

[SGLang] Diffusion Triton Rotary Embedding 다중 헤드 병렬 처리 최적화

Triton rotary embedding 커널을 토큰당 여러 헤드를 동시에 처리하도록 재구성하여 커널 launch 횟수를 줄인다

#SGLang #Triton #Diffusion #Rotary Embedding

2026년 3월 26일

[논문리뷰] When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

최근 멀티모달 대규모 언어 모델(MLLMs)은 추론 작업에서 강력한 성능을 보여주었지만, 이러한 발전은 주로 고품질의 주석 처리된 데이터나 교사 모델(teacher-model) 증류(distillation)에 의존하고 있어 비용이 많이 들고 확장이 어렵습니다.

#Review #Unsupervised Self-Evolution #Multimodal Reasoning #Consistency-Based Reward #Judge Modulation #Group Relative Policy Optimization (GRPO)#Policy Updates #Mathematical Reasoning #Large Language Models

2026년 3월 25일

[논문리뷰] Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning

기존의 Multimodal Large Language Models (MLLMs)는 2D 시각 신호에 과도하게 고정되어 3D 환경에 대한 구조화된 추상화를 구축하지 못함으로써 3D 공간 추론(spatial reasoning)에서 어려움을 겪습니다.

#Review #Multimodal Large Language Models (MLLMs)#Spatial Reasoning #Textual Representation #Allocentric Context #Egocentric Video #Prompting Methods #VSI-Bench #OST-Bench

2026년 3월 25일

[논문리뷰] UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Multimodal Large Language Models (MLLMs)의 발전과 함께 자율 모바일 GUI Agent에 대한 관심이 증가하고 있지만, 기존 방법론들은 비효율적인 실패 궤적(failed trajectory) 학습과 장기(long-horizon) GUI 태스크에서 희소한 보상(sparse rewards)에 따른 모호한 Credit Assignment 문제에 직면하고 있습니다.

#Review #GUI Agent #Self-Evolving Learning #Rejection Fine-Tuning (RFT)#Group Relative Self-Distillation (GRSD)#Credit Assignment #Sparse Rewards #Mobile Automation #Multimodal Large Language Models (MLLMs)

2026년 3월 25일

[논문리뷰] Toward Physically Consistent Driving Video World Models under Challenging Trajectories

자율 주행 시뮬레이션에서 비디오 월드 모델(Video World Models)은 실세계 데이터 수집의 비싼 비용과 고품질 물리 시뮬레이터의 대안으로 중요성이 커지고 있습니다. 기존 주행 월드 모델들은 일반적으로 실제 주행 데이터셋, 주로 안전하고 일반적인 시나리오에 훈련되어 있습니다.

#Review #Driving World Models #Physical Consistency #Video Generation #Challenging Trajectories #Autonomous Driving #Heterogeneous Dataset

2026년 3월 25일

[논문리뷰] T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

기존 LLM red-teaming 연구는 주로 모델에서 유해한 텍스트 출력(harmful text outputs)을 유도하는 데 초점을 맞추었으나, 이는 Model Context Protocol (MCP)과 같은 통합 표준을 통해 다단계 도구 실행(multi-step tool execution)이 가능한 LLM Agents의 새로운 안전 위험을 간과하고 있습니다.

#Review #LLM Agents #Red-Teaming #Vulnerability Discovery #Trajectory-aware Search #MAP-Elites #Tool Call Graph #Attack Realization Rate

2026년 3월 25일

[논문리뷰] StreamingClaw Technical Report

Embodied Intelligence, AI Hardware, Autonomous Driving, Intelligent Cockpits와 같은 Applications은 Real-time Perception–Decision–Action Closed Loop에 크게 의존하며, 이는 Real-time Streaming Video Understanding에 대한 엄격한 요구사항을 부과한다.

#Review #Streaming Video Understanding #Embodied Intelligence #Multi-agent Systems #Long-term Memory #Proactive Interaction #Real-time Inference #OpenClaw

2026년 3월 25일

[논문리뷰] PLDR-LLMs Reason At Self-Organized Criticality

본 연구는 Large Language Models (LLMs)에서 reasoning 능력이 어떻게 발현되며 이를 어떻게 효과적으로 정량화할 수 있는지에 대한 핵심 문제를 다룬다.

#Review #PLDR-LLMs #Self-Organized Criticality #Reasoning #Deductive Outputs #Order Parameter #Phase Transitions #Generalization #Attention Mechanism

2026년 3월 25일

[논문리뷰] OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

Proprietary Systems인 Seedance-2.0 과 같은 모델들은 Omni-capable Video Generation 분야에서 놀라운 성공을 거두었지만, Open-source 대안들은 그에 비해 상당히 뒤쳐져 있습니다.

#Review #Unified Video Generation #Multimodal Composition #Reasoning-Augmented #IntelligentVBench #MLLM #MMDiT #DeepStacking #Free-form Inputs

2026년 3월 25일

[논문리뷰] LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis

Novel View Synthesis (NVS)는 기존 뷰들을 기반으로 새로운 시점 이미지를 생성하는 중요한 태스크이다.

#Review #Novel View Synthesis (NVS)#Latent Geometry #Real-time Rendering #3D Inductive Biases #Encoder-Decoder #VGGT #Generalization #Diffusion Models

2026년 3월 25일

[논문리뷰] GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Multimodal Large Language Models (MLLMs)가 로봇공학부터 가상 세계에 이르기까지 3D 환경 내 자율 에이전트의 perceptual backbone으로 점점 더 많이 활용되고 있다.

2026년 3월 25일

[논문리뷰] EVA: Efficient Reinforcement Learning for End-to-End Video Agent

기존 멀티모달 대규모 언어 모델(MLLM) 기반 비디오 이해 시스템은 비디오를 수동적인 인식기로 처리하여, 전체 비디오나 균일하게 샘플링된 프레임을 어떠한 적응적 추론 없이 처리하는 한계가 있습니다.

#Review #Video Agent #Reinforcement Learning #MLLM #Planning-before-Perception #Tool Use #KTO #GRPO

2026년 3월 25일

[논문리뷰] CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

최근 Multimodal Agentic Pipelines이 Human-Computer Interaction을 변화시키고 있지만, 대부분 Short-Horizon 또는 General-Purpose Application에 초점을 맞추고 있으며, 특히 Healthcare 분야에서 Long-Horizon Automation은 크게 탐구되지 않은 상태이다.

#Review #Multi-Agent Framework #Healthcare Automation #Long-Horizon Tasks #Actor-Critic #Tool Grounding #Dual-Memory #CareFlow #GUI Agents

2026년 3월 25일

[논문리뷰] Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

최근 LLM(Large Language Models)의 발전은 복잡한 태스크에서 추론, 계획 및 실행이 가능한 에이전트 시스템을 가능하게 했지만, 불확실한 환경에서 자원을 효과적으로 할당할 수 있는지에 대한 여부는 불분명하다. resource allocation 은 단기적인 반응적 의사결정과 근본적으로 다르다.

#Review #LLM Agents #Resource Allocation #Enterprise Simulation #Financial Management #Uncertainty #Long-Horizon Decision-Making #CFO

2026년 3월 25일

[논문리뷰] CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

지능형 에이전트가 복잡한 데스크톱 워크플로우를 자동화할 수 있다는 비전은 연속적이고 고품질의 인간 데모 비디오 부족으로 인해 진전이 지연되고 있다.

#Review #Computer-Use Agents #Video Demonstrations #Human Annotation #Desktop Applications #Visual Grounding #Action Prediction #Multi-layered Reasoning #Foundation Action Models

2026년 3월 25일

[논문리뷰] 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

Video Diffusion Transformers (DiTs)는 탁월한 비디오 생성 능력을 보여주지만, 높은 메모리 사용량과 막대한 계산 비용으로 인해 실제 배포에 심각한 제약을 받는다.

#Review #Video Diffusion Transformers #Mixed-Precision Quantization #Inference Acceleration #Temporal Delta Cache #NVFP4 #INT8 #Post-Training Quantization #Memory Reduction

2026년 3월 25일