#Model Scaling

4개의 포스트

[논문리뷰] Delta Attention Residuals

본 논문은 기존 Attention Residuals에서 발생하는 routing collapse 문제를 해결하고자 한다. 기존 모델들은 각 레이어의 출력 $h_i$가 이전 레이어들의 누적 합이기 때문에, 레이어가 깊어질수록 인접한 $h_i$와 $h_{i-1}$ 간의 중복성이 극도로 높아진다 .

#Review #Attention Residuals #Delta Representation #Additive Routing #Transformer #Model Scaling #Fine-tuning

2026년 5월 19일

[논문리뷰] Measuring Maximum Activations in Open Large Language Models

본 논문은 최신 오픈 LLM 생태계에서 Activation의 동적 범위(Dynamic Range)가 단순히 파라미터 수에 비례한다는 기존의 통념을 재검토하고, 모델별 Maximum Activation Magnitude(MM)를 체계적으로 측정하여 배포 시의 위험을 파악하고자 합니다.

#Review #Large Language Models #Activation Range #Quantization #Maximum Activation #LLM Inference #Residual Stream #Model Scaling

2026년 5월 18일

[논문리뷰] MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

기존 통합 멀티모달 LLM이 시각적 이해와 생성 능력 사이의 성능 트레이드오프, 특히 텍스트가 풍부한 벤치마크에서의 저하를 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #Multimodal LLM #Hybrid Tokenizer #Text-to-Image Generation #Visual Question Answering #Autoregressive Model #Diffusion Decoder #Unified Architecture #Model Scaling

2025년 9월 22일

[논문리뷰] Direct Multi-Token Decoding

본 논문은 대규모 언어 모델(LLM)의 비효율적인 계층 활용을 해결하여 추론 속도를 가속화하는 것을 목표로 합니다.

#Review #LLM Inference #Multi-token Decoding #Transformer Architecture #Layer Specialization #Cyclical Refilling #Inference Speedup #Model Scaling

2025년 10월 16일