#Mamba2

2개의 포스트

[vllm] vLLM Mamba2 SSD 커널 웜업: 첫 요청 지연 시간 91% 감소의 비결

vLLM Mamba2 모델의 첫 요청 지연 시간을 91% 줄인 Triton 커널 웜업 최적화 분석.

#vLLM #Mamba2 #Triton #Kernel Optimization #Latency Reduction #Deep Learning Inference

2026년 5월 12일

[논문리뷰] StateX: Enhancing RNN Recall via Post-training State Expansion

본 논문은 Transformer 대비 긴 컨텍스트 처리 효율이 높은 RNN 계열 모델들이 고정된 크기의 recurrent state 로 인해 장문 컨텍스트에서의 정보 회상 능력(recall ability) 이 떨어지는 문제를 해결하고자 합니다.

#Review #RNN #State Expansion #Post-training #Long-context Recall #Linear Attention #State Space Models #GLA #Mamba2

2025년 9월 29일