#Latent Attention

1개의 포스트

[논문리뷰] TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill & Decode Inference

Di Yin이 arXiv에 게시한 'TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill & Decode Inference' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Inference #Tensor Parallelism #KV Cache Optimization #Latent Attention #Memory Efficiency #Decoding Speedup #Prefill/Decode Separation #Reparameterization

2025년 8월 25일