#llm-compressor

53개의 포스트

[llm-compressor] Sentinel & Typing: 센티넬 객체와 타입 별칭

sentinel.py의 Sentinel 클래스가 '기본값과 명시적 None 구분'을 해결하는 방법과 typing.py의 프로젝트 공통 타입 별칭 분석

#llm-compressor #Sentinel #Typing

2026년 4월 13일

[llm-compressor] PyTorch Utils: 희소화 통계와 모듈 헬퍼

pytorch/utils와 utils/pytorch 디렉토리의 sparsification_info, module.py 헬퍼가 제공하는 공통 PyTorch 유틸리티 분석

#llm-compressor #PyTorch #Utils

2026년 4월 13일

[llm-compressor] Dataset Calibration: c4/wikitext/ultrachat 로더

datasets 디렉토리와 transformers/data 디렉토리가 캘리브레이션 데이터셋을 로딩하고 토크나이즈하는 구조 분석

#llm-compressor #Dataset #Calibration

2026년 4월 13일

[llm-compressor] Modeling Overrides: DeepSeek/Llama4/Qwen3 등 모델별 패치

modeling 디렉토리가 DeepSeek-V3, Llama-4, Qwen3-MoE, GPT-OSS 등 특수 아키텍처를 llm-compressor에 맞춰 override하는 구조 분석

#llm-compressor #Modeling #MoE #Override

2026년 4월 13일

[llm-compressor] Compression Save: compressed-tensors 체크포인트 저장

transformers/compression 디렉토리가 압축된 모델을 compressed-tensors 포맷으로 직렬화해 vLLM/SGLang이 로딩 가능하도록 만드는 구조 분석

#llm-compressor #Compression #Save #compressed-tensors

2026년 4월 13일

[llm-compressor] Transformers Tracing: 모델 그래프 추적과 부분 forward

transformers/tracing 디렉토리의 debug.py가 HuggingFace 모델을 torch.fx로 추적해 서브그래프 분할을 가능하게 하는 구조 분석

#llm-compressor #Tracing #HuggingFace #FX

2026년 4월 13일

[llm-compressor] iMatrix Transform: 중요도 행렬 기반 가중치 리스케일

IMatrixGatherer Modifier가 입력 활성화의 E[x^2]를 수집해 가중치를 리스케일하는 구조 분석

#llm-compressor #iMatrix #Transform

2026년 4월 13일

[llm-compressor] SpinQuant: 학습된 회전 행렬 기반 양자화

SpinQuant 논문의 4가지 회전(R1/R2/R3/R4)과 Cayley SGD 기반 학습 방법, llm-compressor의 mappings/norm_mappings 구현 분석

#llm-compressor #SpinQuant #Quantization #Rotation

2026년 4월 13일

[llm-compressor] QuIP: 랜덤 직교 변환 기반 2비트 양자화

QuIP 논문의 incoherence processing 아이디어와 llm-compressor에서 랜덤 아다마르/직교 행렬로 2비트 양자화를 가능하게 만드는 구현 분석

#llm-compressor #QuIP #Quantization #2bit

2026년 4월 13일

[llm-compressor] Transform Overview: 가중치 회전/변환 기반 Modifier 계열

llm-compressor의 transform 계열(QuIP/SpinQuant/iMatrix/SmoothQuant transform)이 공유하는 '가중치 변환 후 양자화' 패턴 분석

#llm-compressor #Transform #Overview

2026년 4월 13일

[llm-compressor] Magnitude Pruning: 크기 기반과 상수 희소성 Modifier

MagnitudePruningModifier가 가중치 크기만으로 pruning을 수행하는 data-free 구조와, ConstantPruningModifier가 기존 마스크를 유지하는 방식 분석

#llm-compressor #Pruning #Magnitude

2026년 4월 13일

[llm-compressor] Wanda: 활성화 가중 노름 기반 가지치기

Wanda 논문의 |W| * ||X||_2 중요도 공식이 llm-compressor에서 어떻게 구현되는지, SparseGPT와의 비교 분석

#llm-compressor #Wanda #Pruning

2026년 4월 13일

[llm-compressor] SparseGPT: 원샷 LLM 가지치기 구현

SparseGPT 논문의 OBS 기반 가지치기가 llm-compressor에서 어떻게 구현되는지, Hessian 누적과 2:4 sparsity 마스크 생성 분석

#llm-compressor #SparseGPT #Pruning

2026년 4월 13일

[llm-compressor] Pruning Overview: OBCQ 계열 Modifier 구조

llm-compressor의 pruning/obcq 계층이 SparseGPT/Wanda/Magnitude Pruning을 어떻게 공통 베이스로 추상화하는지 분석

#llm-compressor #Pruning #Overview

2026년 4월 13일

[llm-compressor] Logarithmic Equalization: 로그 스케일 채널 균등화

LogEqualizationModifier가 채널 간 가중치 분포 편차를 로그 스케일로 균등화해 양자화 친화적 분포를 만드는 원리와 구현 분석

#llm-compressor #LogEqualization #Quantization

2026년 4월 13일

[llm-compressor] AutoRound: 부호 경사 하강법으로 라운딩 최적화

AutoRound 논문의 SignSGD 기반 rounding 최적화가 llm-compressor에서 어떻게 구현되는지, nsamples/iters/seqlen 파라미터 분석

#llm-compressor #AutoRound #Quantization #PTQ

2026년 4월 13일

[llm-compressor] SmoothQuant: 활성화→가중치 양자화 난이도 이동

SmoothQuant 논문의 activation smoothing 기법이 llm-compressor에서 어떻게 구현되어 있고, per-channel scale 결정과 RMSNorm 흡수 방식 분석

#llm-compressor #SmoothQuant #Quantization #W8A8

2026년 4월 13일

[llm-compressor] AWQ: 활성화 인식 가중치 양자화 구현

AWQ 논문의 salient weight 스케일링 아이디어가 llm-compressor에서 mappings와 dynamic_mappings를 통해 어떻게 구현되는지 분석

#llm-compressor #AWQ #Quantization #PTQ

2026년 4월 13일

[llm-compressor] GPTQ: 2차 정보 기반 후훈련 양자화 구현

GPTQ 논문의 Hessian 기반 양자화가 llm-compressor에 어떻게 구현되어 있는지, block_size/dampening_frac/actorder 파라미터와 sequential epoch 종료 시 quantize_weight 호출 구조 분석

#llm-compressor #GPTQ #Quantization #PTQ

2026년 4월 13일

[llm-compressor] Group Size Validation: 그룹 크기 호환성 검사

group_size_validation.py의 validate_group_size 함수가 레이어 shape과 group_size의 호환성을 검증하고 에러 메시지를 제공하는 구조 분석

#llm-compressor #Quantization #Validation

2026년 4월 13일

[llm-compressor] Quantization Calibration: update_weight_zp_scale와 observer 등록

calibration.py의 update_weight_zp_scale, update_weight_global_scale 같은 헬퍼 함수들이 모듈 단위로 observer를 호출해 스케일을 결정하는 흐름 분석

#llm-compressor #Quantization #Calibration

2026년 4월 13일

[llm-compressor] Quantization Base: QuantizationModifier와 QuantizationMixin

QuantizationModifier가 PTQ/QAT 라이프사이클을 어떻게 관리하고, QuantizationMixin이 observer 등록/calibration/종료를 어떻게 처리하는지 분석

#llm-compressor #Quantization #Modifier

2026년 4월 13일

[llm-compressor] iMatrix Observer: 입력 채널 중요도 가중 MSE

IMatrixMSEObserver가 forward pre-hook으로 입력의 E[x^2]를 수집해 채널별 중요도를 계산하고, 그 가중치로 MSE grid search를 수행하는 구조 분석

#llm-compressor #Observer #iMatrix #Quantization

2026년 4월 13일

[llm-compressor] Moving Average Observer: 지수 이동 평균 기반 온라인 관측자

MovingAverageObserverBase가 여러 배치의 min/max를 지수 이동 평균으로 누적해서 안정적인 스케일을 제공하는 구조 분석

#llm-compressor #Observer #MovingAverage

2026년 4월 13일

[llm-compressor] MSE Observer: Grid Search로 양자화 오차 최소화

MemorylessMSEObserver와 MovingAverageMSEObserver가 min/max 범위를 점진적으로 줄여가며 양자화 MSE를 최소화하는 grid search 로직 분석

#llm-compressor #Observer #MSE #Quantization

2026년 4월 13일

[llm-compressor] MinMax Observer: 세 가지 min/max 계산 정책

MemorylessMinMaxObserver, StaticMinMaxObserver, MinMaxObserver 세 변형이 각각 어떻게 min/max를 집계하는지 코드 분석

#llm-compressor #Observer #Quantization #MinMax

2026년 4월 13일

[llm-compressor] Observers Base: 스케일/제로포인트 계산의 추상 기반

Observer 베이스 클래스가 get_min_max 훅을 통해 스케일과 제로포인트를 계산하고, compressed-tensors의 calculate_qparams를 호출하는 구조 분석

#llm-compressor #Observer #Quantization

2026년 4월 13일

[llm-compressor] Modifier Interface: 추상 계약과 타입 체크

ModifierInterface ABC가 정의하는 initialized/finalized 프로퍼티와 initialize/finalize/update_event 추상 메서드 분석

#llm-compressor #Modifier #Interface #ABC

2026년 4월 13일

[llm-compressor] Modifier Factory: 문자열 이름에서 Modifier 인스턴스 생성

ModifierFactory가 패키지를 재귀 스캔해 Modifier 서브클래스를 등록하고, 레시피 YAML의 문자열 이름에서 실제 인스턴스를 만드는 메커니즘 분석

#llm-compressor #Modifier #Factory #Registry

2026년 4월 13일

[llm-compressor] Modifier Base: 모든 Modifier가 상속하는 기반 클래스

Modifier 클래스의 라이프사이클 메서드(initialize/update_event/finalize), start/end 훅, should_start/should_end 조건 검사 분석

#llm-compressor #Modifier #Base

2026년 4월 13일

[llm-compressor] Intermediates Cache: 서브그래프 활성화 오프로드 캐시

IntermediatesCache가 배치별 중간 활성화를 CPU/GPU 사이에서 오프로드/온로드하면서 메모리를 관리하는 구조와 prefetch 메커니즘 분석

#llm-compressor #Pipeline #Memory #Offload

2026년 4월 13일

[llm-compressor] Data-Free & Independent Pipeline: 데이터 없는 파이프라인과 Modifier별 개별 실행

DataFreePipeline의 포워드 없는 구조와 IndependentPipeline의 Modifier별 파이프라인 자동 선택 로직 분석

#llm-compressor #Pipeline #DataFree #Independent

2026년 4월 13일

[llm-compressor] Sequential Pipeline: 레이어 단위 서브그래프 캘리브레이션

SequentialPipeline이 모델을 서브그래프로 쪼개고 중간 활성화를 오프로드하며 GPTQ/SparseGPT를 수행하는 구조 분석

#llm-compressor #Pipeline #Sequential #Calibration

2026년 4월 13일

[llm-compressor] Basic Pipeline: 한 번의 forward로 끝내는 캘리브레이션

BasicPipeline이 모델 전체를 단일 forward로 순회하며 캘리브레이션하는 구조와 loss mask, dispatch_model 처리 분석

#llm-compressor #Pipeline #Calibration

2026년 4월 13일

[llm-compressor] Pipeline Registry: Modifier 목록을 보고 파이프라인 자동 선택

CalibrationPipeline 추상 클래스와 from_modifiers 디스패처가 basic/sequential/data_free/independent 중 어떤 파이프라인을 고를지 결정하는 로직 분석

#llm-compressor #Pipeline #Registry

2026년 4월 13일

[llm-compressor] Events: 배치 라이프사이클 훅과 에폭 계산 로직

EventType enum과 Event dataclass가 정의하는 배치/에폭/옵티마이저 훅, should_update 조건 검사, epoch 계산 속성 분석

#llm-compressor #Events #Core

2026년 4월 13일

[llm-compressor] State & ModelLayer: 압축 상태 저장소

State/Data/Hardware/ModifiedState dataclass가 어떤 정보를 보관하는지와 update() 메서드의 데이터 복사 정책 분석

#llm-compressor #State #Core

2026년 4월 13일

[llm-compressor] Lifecycle: Modifier 초기화-이벤트-종료 상태 머신

CompressionLifecycle 데이터클래스가 Recipe의 Modifier 리스트를 initialize/event/finalize 3단계로 어떻게 순회하는지와 이벤트 순서 검증 로직 분석

#llm-compressor #Lifecycle #Core

2026년 4월 13일

[llm-compressor] CompressionSession: 전역 싱글톤 세션과 Lifecycle 래퍼

llm-compressor의 CompressionSession 클래스와 active_session() 전역 디스패처가 어떻게 Lifecycle, State를 관리하는지 코드 분석

#llm-compressor #Session #Core #Lifecycle

2026년 4월 13일

[llm-compressor] Recipe Metadata: 직렬화 헬퍼와 모델 메타데이터 구조

Recipe YAML 직렬화/병합 헬퍼와 DatasetMetaData, ParamMetaData, LayerMetaData, ModelMetaData Pydantic 모델 분석

#llm-compressor #Recipe #Metadata

2026년 4월 13일

[llm-compressor] Recipe DSL: YAML로 Modifier를 조합하는 선언적 언어

llm-compressor의 Recipe 클래스가 YAML/JSON/Python 문자열을 어떻게 Modifier 리스트로 변환하는지, 스테이지/그룹/args 구조를 코드로 분석

#llm-compressor #Recipe #DSL #YAML

2026년 4월 13일

[llm-compressor] Args Dataclasses: 평평한 Kwargs를 세 개의 구조로 분리하기

ModelArguments, DatasetArguments, RecipeArguments 세 dataclass가 oneshot() 호출 인자를 어떻게 나눠 받는지와 HfArgumentParser 기반 파싱 구조 분석

#llm-compressor #Args #Configuration

2026년 4월 13일

[llm-compressor] Model-Free Entrypoint: 모델 정의 없이 체크포인트만으로 PTQ

llm-compressor의 model_free_ptq가 safetensors 샤드를 직접 열어 캘리브레이션 없이 양자화하는 구조를 코드 레벨에서 분석

#llm-compressor #Entrypoint #Model-Free #PTQ

2026년 4월 13일

[llm-compressor] Oneshot 진입점: 한 번의 호출로 끝나는 압축 파이프라인

llm-compressor의 최상위 API인 oneshot() 함수가 모델 로딩, 캘리브레이션, 레시피 적용, 저장까지 어떻게 한 번에 처리하는지 코드 레벨에서 분석

#llm-compressor #Entrypoint #Oneshot #PTQ

2026년 4월 13일

[llm-compressor] 프로젝트 전체 아키텍처 분석 - 개요 및 목차

llm-compressor의 전체 아키텍처를 11개 계층으로 분석하고, 45개 포스트와 8편 논문 구현을 정리한 시리즈의 개요 포스트

#llm-compressor #Architecture #Quantization #Pruning #PTQ

2026년 4월 13일

[llm-compressor] Gemma4 MoE 모델 양자화를 위한 llm-compressor 지원 추가 분석

llm-compressor에 Gemma4 MoE 모델의 양자화 및 최적화를 위한 지원을 추가하는 PR을 분석합니다.

#llm-compressor #Gemma4 #MoE #양자화 #최적화 #기술 블로그

2026년 4월 7일

[llm-compressor] GPTQ Block Quantization 지원

GPTQ 양자화에 block quantization을 추가하여 더 세밀한 양자화 그룹 분할과 품질 향상

#llm-compressor #Performance

2026년 3월 31일

[llm-compressor] iMatrix Weighted MSE Observer - 중요도 행렬 기반 양자화

Importance Matrix(iMatrix)를 활용한 가중 MSE observer로 중요 가중치의 양자화 정밀도를 우선 보존

#llm-compressor #Performance

2026년 3월 27일

[llm-compressor] AWQ DDP - 분산 데이터 병렬 AWQ 양자화

AWQ 양자화에 DDP(Distributed Data Parallel)를 적용하여 멀티 GPU에서 캘리브레이션 속도 향상

#llm-compressor #Performance

2026년 3월 18일

[llm-compressor] Intermediates Cache Prefetch - 중간 결과 프리페칭

양자화 캘리브레이션의 중간 결과를 프리페칭하여 레이어 순차 처리의 대기 시간을 감소

#llm-compressor #Performance

2026년 3월 17일

[llm-compressor] DataLoader 최적화와 Single-pass Weight Calibration

DataLoader 옵션 확장과 단일 패스 가중치 캘리브레이션으로 양자화 파이프라인 속도와 유연성 개선

#llm-compressor #Performance

2026년 2월 18일

[llm-compressor] Memoryless Observers - 메모리 효율적 가중치 관찰자

양자화 캘리브레이션의 가중치 관찰자를 memoryless 방식으로 전환하여 메모리 사용량 대폭 감소

#llm-compressor #Performance

2026년 1월 19일

[llm-compressor] Disable LM Head - 불필요한 LM Head 연산 비활성화

양자화 캘리브레이션에서 LM Head 레이어의 forward pass를 비활성화하여 시간과 메모리 절약

#llm-compressor #Performance

2025년 12월 5일