#GGUF

2개의 포스트

[논문리뷰] RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference

최근 Large Language Models (LLMs)는 자연어 처리 분야를 혁신했지만, FP16 포맷의 Llama-2-13B 모델이 26GB 의 memory를 요구하는 등 막대한 memory requirement로 인해 consumer GPU나 edge device에 배포하는 데 어려움을 겪는 Memory Wall 문제가 존재합니다.

#Review #Mixed-Precision Quantization #Reinforcement Learning #Post-Training Quantization #Large Language Models #Policy Transfer #Scale Folding #GGUF #On-Device Inference

2026년 3월 18일

[논문리뷰] Performance Trade-offs of Optimizing Small Language Models for E-Commerce

본 논문은 대규모 상용 LLM의 높은 비용과 리소스 제약 문제를 해결하기 위해, 소규모 오픈-웨이트 모델이 특정 도메인 작업에서 효율적인 대안이 될 수 있는지 검증하는 것을 목표로 합니다.

#Review #Small Language Models #E-commerce #Intent Recognition #Fine-tuning #QLoRA #Quantization #GPTQ #GGUF #Hardware-aware Optimization

2025년 10월 31일