#PTQ

7개의 포스트

[논문리뷰] E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

본 논문은 모델 병합(Model Merging) 후 저비트 양자화(Low-bit Quantization)를 적용할 때 발생하는 성능 저하 문제를 해결하고자 합니다.

#Review #Post-Merge Quantization #Model Merging #PTQ #Quantization Deviation #Merged-Weight Anchoring #Expert-Guided Calibration

2026년 5월 18일

[llm-compressor] AutoRound: 부호 경사 하강법으로 라운딩 최적화

AutoRound 논문의 SignSGD 기반 rounding 최적화가 llm-compressor에서 어떻게 구현되는지, nsamples/iters/seqlen 파라미터 분석

#llm-compressor #AutoRound #Quantization #PTQ

2026년 4월 13일

[llm-compressor] AWQ: 활성화 인식 가중치 양자화 구현

AWQ 논문의 salient weight 스케일링 아이디어가 llm-compressor에서 mappings와 dynamic_mappings를 통해 어떻게 구현되는지 분석

#llm-compressor #AWQ #Quantization #PTQ

2026년 4월 13일

[llm-compressor] GPTQ: 2차 정보 기반 후훈련 양자화 구현

GPTQ 논문의 Hessian 기반 양자화가 llm-compressor에 어떻게 구현되어 있는지, block_size/dampening_frac/actorder 파라미터와 sequential epoch 종료 시 quantize_weight 호출 구조 분석

#llm-compressor #GPTQ #Quantization #PTQ

2026년 4월 13일

[llm-compressor] Model-Free Entrypoint: 모델 정의 없이 체크포인트만으로 PTQ

llm-compressor의 model_free_ptq가 safetensors 샤드를 직접 열어 캘리브레이션 없이 양자화하는 구조를 코드 레벨에서 분석

#llm-compressor #Entrypoint #Model-Free #PTQ

2026년 4월 13일

[llm-compressor] Oneshot 진입점: 한 번의 호출로 끝나는 압축 파이프라인

llm-compressor의 최상위 API인 oneshot() 함수가 모델 로딩, 캘리브레이션, 레시피 적용, 저장까지 어떻게 한 번에 처리하는지 코드 레벨에서 분석

#llm-compressor #Entrypoint #Oneshot #PTQ

2026년 4월 13일

[llm-compressor] 프로젝트 전체 아키텍처 분석 - 개요 및 목차

llm-compressor의 전체 아키텍처를 11개 계층으로 분석하고, 45개 포스트와 8편 논문 구현을 정리한 시리즈의 개요 포스트

#llm-compressor #Architecture #Quantization #Pruning #PTQ

2026년 4월 13일