#Ultra-Long Context

1개의 포스트

[논문리뷰] FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

본 논문은 초장기 context 처리 시 발생하는 KV 캐시의 메모리 병목 현상을 해결하고자 합니다 . 기존 LLM은 모든 historical context를 GPU 메모리에 상주시켜야 하므로, context 길이가 길어짐에 따라 GPU 메모리 요구량이 선형적으로 증가하는 치명적인 한계가 있습니다.

#Review #Large Language Models #Ultra-Long Context #Sparse Attention #KV Cache Compression #Lookahead Sparse Attention #Neural Memory Indexer #Decoupled Training

2026년 6월 8일