#Long-context

4개의 포스트

[논문리뷰] A Sovereign, Open-Source Foundation Model for German and English

본 연구는 기존 오픈 소스 모델들이 가진 세 가지 핵심적인 한계를 해결하고자 합니다. 첫째, 상당수의 '오픈' 모델들이 실제로는 가중치만 공개하고 데이터와 학습 레시피를 불투명하게 처리하여 재현성을 저해하고 있습니다.

#Review #Foundation Model #Mixture-of-Experts #Mamba-Transformer #Long-context #Sovereign AI #German-English #Open-Source

2026년 7월 12일

[논문리뷰] Gemma 4 Technical Report

본 논문은 최신 LLM 생태계에서 요구되는 강력한 multimodal 이해도, 복잡한 추론 능력, 그리고 컴퓨팅 효율성을 동시에 달성하기 위해 Gemma 4 모델 제품군을 제안합니다.

#Review #Multimodal #Mixture-of-Experts #Reasoning Trace #Speculative Decoding #Quantization-Aware Training #Long-context #Encoder-free

2026년 7월 7일

[논문리뷰] Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding

본 논문은 기존의 Tree-based Speculative Decoding이 겪고 있는 속도와 정확도(MAT) 사이의 Pareto tradeoff 문제를 해결하고자 한다.

#Review #Speculative Decoding #Tree Construction #Dynamic Pruning #Retrieval-based #GPU-resident #Budget Compensation #Long-context

2026년 5월 19일

[논문리뷰] LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

본 논문은 LLM의 Long-context 추론 능력을 강화하기 위한 RL 과정에서 모델 내부의 Intrinsic Representation이 충분히 활용되지 못하는 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #Large Language Models #Long-context #Sparsity #Activation Patterns #Saliency-guided

2026년 4월 16일