#Linear Attention

23개의 포스트

[SGLang] Lightning Attention: 고속 선형 어텐션 구현

SGLang의 Lightning Attention을 분석한다. IO-aware 선형 어텐션의 구현, 청크 기반 처리, 기존 선형 어텐션 대비 속도 향상을 코드와 함께 살펴본다.

#sglang #Lightning Attention #Linear Attention #IO-aware

2026년 4월 11일

[SGLang] KDA (Kernel-Driven Attention): 커널 기반 선형 어텐션

SGLang의 KDA 선형 어텐션을 분석한다. 커널 함수를 활용한 선형 시간 어텐션, Triton/CutEDSL 커널 구현을 코드와 함께 살펴본다.

#sglang #KDA #Kernel-Driven Attention #Linear Attention

2026년 4월 11일

[SGLang] GDN (Gated Diagonal Net): 게이트 기반 선형 어텐션

SGLang의 GDN 선형 어텐션을 분석한다. Gated Diagonal Net의 선형 복잡도 어텐션 구현, 게이트 메커니즘, FlashInfer/Triton/CutEDSL 커널 선택을 코드와 함께 살펴본다.

#sglang #GDN #Linear Attention #Gated Diagonal Net

2026년 4월 11일

[논문리뷰] Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

본 논문은 Linear Attention 기반 모델을 위한 통합 게이트 조건 주입 모듈인 GateControl을 제안합니다. 이 방식은 학습 가능한 게이트를 통해 토큰별로 중요한 조건 정보만을 선택적으로 보존함으로써, 기존의 Multimodal Attention 없이도 강력한 제어 성능을 달성합니다.

#Review #Diffusion Transformer #Linear Attention #Controllable Generation #Gated Condition Injection #On-device AI

2026년 4월 2일

[SGLang] GDN의 kkt + solve_tril을 하나의 Triton 커널로 퓨전

Gated Delta Network의 K@K^T 계산과 삼각 행렬 풀이를 단일 Triton 커널로 합쳐 HBM 왕복을 제거한다

#SGLang #Triton #Kernel Fusion #Linear Attention

2026년 3월 29일

[sglang] AMD/ROCm 시작 크래시 수정: CuteDSL KDA 커널 Lazy Import 적용

SGLang에서 CuteDSL KDA 커널의 top-level import가 AMD/ROCm 환경에서 시작 시 크래시를 유발하는 문제를 lazy import로 수정한 분석.

#SGLang #AMD #ROCm #Bug Fix #Lazy Import #Linear Attention

2026년 3월 25일

[논문리뷰] Memory Caching: RNNs with Growing Memory

Meisam Razaviyayn이 arXiv에 게시한 'Memory Caching: RNNs with Growing Memory' 논문에 대한 자세한 리뷰입니다.

#Review #Recurrent Neural Networks #Memory Caching #Sequence Modeling #Long-Context #Transformers #Linear Attention #Language Modeling #Retrieval Tasks

2026년 3월 1일

[논문리뷰] HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

arXiv에 게시된 'HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation' 논문에 대한 자세한 리뷰입니다.

#Review #Sequential Recommendation #Hybrid Attention #Temporal-Aware #Long Sequences #Generative Recommendation #Linear Attention #Softmax Attention

2026년 2월 25일

[논문리뷰] Test-Time Training with KV Binding Is Secretly Linear Attention

arXiv에 게시된 'Test-Time Training with KV Binding Is Secretly Linear Attention' 논문에 대한 자세한 리뷰입니다.

#Review #Test-Time Training #KV Binding #Linear Attention #Sequence Modeling #Model Interpretation #Computational Efficiency #Dynamic Adaptation

2026년 2월 24일

[논문리뷰] 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Eric C. Larson이 arXiv에 게시한 '2Mamba2Furious: Linear in Complexity, Competitive in Accuracy' 논문에 대한 자세한 리뷰입니다.

#Review #Linear Attention #Mamba-2 #High-Order Attention #Model Efficiency #Long Context #Transformer #State Space Models

2026년 2월 19일

[논문리뷰] SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

arXiv에 게시된 'SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer' 논문에 대한 자세한 리뷰입니다.

#Review #Video Diffusion Models #Sparse Attention #Linear Attention #Computational Efficiency #Transformer Tuning #Video Generation #LoRA #Gating Mechanism

2026년 1월 25일

[논문리뷰] MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

arXiv에 게시된 'MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head' 논문에 대한 자세한 리뷰입니다.

#Review #Linear Attention #Multi-Head Attention #Transformer #Global Context Collapse #Representational Diversity #Image Generation #NLP #Video Generation

2026년 1월 12일

[논문리뷰] Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

arXiv에 게시된 'Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers' 논문에 대한 자세한 리뷰입니다.

#Review #Language Models #Transformer Architecture #Canon Layers #Synthetic Pretraining #Reasoning Depth #Linear Attention #State-Space Models #NoPE

2025년 12월 21일

[논문리뷰] InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

arXiv에 게시된 'InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Vision-Language Models #Linear Attention #Sliding Window Attention #Gated DeltaNet #Long-Context Understanding #Efficiency #Hybrid Architecture #Multimodal Learning

2025년 12월 10일

[논문리뷰] Higher-order Linear Attention

arXiv에 게시된 'Higher-order Linear Attention' 논문에 대한 자세한 리뷰입니다.

#Review #Linear Attention #Higher-order Interactions #Causal Streaming #Associative Scans #Prefix Summaries #Transformer Architectures #State Space Models

2025년 11월 9일

[논문리뷰] Kimi Linear: An Expressive, Efficient Attention Architecture

arXiv에 게시된 'Kimi Linear: An Expressive, Efficient Attention Architecture' 논문에 대한 자세한 리뷰입니다.

#Review #Linear Attention #Hybrid Architecture #Kimi Delta Attention (KDA)#Gating Mechanism #Long-Context Modeling #Efficient Inference #Transformer

2025년 10월 31일

[논문리뷰] Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

arXiv에 게시된 'Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #Long-Context LLM #Hybrid Attention #Linear Attention #Mixture-of-Experts #FP8 Training #GPU Optimization #Training-Inference Alignment #Reinforcement Learning

2025년 10월 23일

[논문리뷰] Native Hybrid Attention for Efficient Sequence Modeling

Yu Cheng이 arXiv에 게시한 'Native Hybrid Attention for Efficient Sequence Modeling' 논문에 대한 자세한 리뷰입니다.

#Review #Sequence Modeling #Hybrid Attention #Transformer Architecture #Linear Attention #Sliding Window Attention #Long Context #Large Language Models (LLMs)#Efficiency

2025년 10월 9일

[논문리뷰] SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

arXiv에 게시된 'SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion Transformers #Sparse Attention #Linear Attention #Model Acceleration #Video Generation #Attention Mechanisms #Fine-tuning

2025년 9월 30일

[논문리뷰] SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

arXiv에 게시된 'SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer' 논문에 대한 자세한 리뷰입니다.

#Review #Video Generation #Diffusion Model #Linear Attention #Transformer #Long Video #Efficient Inference #Constant Memory #Low-Cost Training #RTX Deployment

2025년 9월 30일

[논문리뷰] StateX: Enhancing RNN Recall via Post-training State Expansion

Zhiyuan Liu이 arXiv에 게시한 'StateX: Enhancing RNN Recall via Post-training State Expansion' 논문에 대한 자세한 리뷰입니다.

#Review #RNN #State Expansion #Post-training #Long-context Recall #Linear Attention #State Space Models #GLA #Mamba2

2025년 9월 29일

[논문리뷰] Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Jusen Du이 arXiv에 게시한 'Speed Always Wins: A Survey on Efficient Architectures for Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Efficient Architectures #Transformer Optimization #Linear Attention #State Space Models #Mixture-of-Experts #Sparse Attention #Diffusion LLMs

2025년 8월 19일

[논문리뷰] On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Eric C. Larson이 arXiv에 게시한 'On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective' 논문에 대한 자세한 리뷰입니다.

#Review #Softmax Attention #Linear Attention #Recurrent Neural Networks (RNNs)#Taylor Series Expansion #Attention Mechanisms #Expressiveness #Transformer Architectures

2025년 8월 2일