#Hardware Efficiency

2개의 포스트

[논문리뷰] RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

논문은 LLM의 극단적인 2비트 양자화에서 발생하는 성능과 효율성 간의 치명적인 트레이드오프 를 해결하고자 합니다.

#Review #LLM Quantization #2-bit Quantization #Residual Binarization #Quantization-Aware Training (QAT)#Inter-Path Adaptation #Hardware Efficiency #Model Compression #Low-Bit LLMs

2026년 2월 8일

[논문리뷰] INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

현대 AI 하드웨어는 LLM의 아웃라이어를 처리하기 위해 저정밀 부동소수점(FP) 형식을 점차 채택하고 있으나, 다양한 과립도(granularity)에 걸친 FP와 정수(INT) 양자화에 대한 통합적인 비교 연구가 부족합니다.

#Review #Quantization #Low-bit Formats #Integer Quantization #Floating-Point Quantization #Large Language Models (LLMs)#Hardware Efficiency #Fine-Grained Quantization #MXINT8

2025년 11월 9일