#Training-Inference Alignment

1개의 포스트

[논문리뷰] Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

본 논문은 기존의 Softmax Attention 이 긴 시퀀스 길이에서 겪는 계산 및 I/O 오버헤드 문제 를 해결하고, 순수 Linear Attention 모델의 성능 한계를 극복하기 위해 효율적인 하이브리드 아키텍처를 제안합니다.

#Review #Long-Context LLM #Hybrid Attention #Linear Attention #Mixture-of-Experts #FP8 Training #GPU Optimization #Training-Inference Alignment #Reinforcement Learning

2025년 10월 23일