#Layer-level Sparsity

1개의 포스트

[논문리뷰] Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

본 논문은 기존 long-context LLM 추론에서 발생하는 quadratic computational complexity와 기존 하이브리드 어텐션 기법들의 한계를 해결하고자 합니다.

#Review #Large Language Models #Long-context Inference #Hybrid Attention #Dynamic Routing #Layer-level Sparsity #Context-aware

2026년 4월 9일