Activation-aware Weight Quantization
AWQ doesn’t quantize all the weights in a model, and instead, it preserves a small percentage of weights that are important for LLM performance. This significantly reduces quantization loss such that you can run models in 4-bit precision without experiencing any performance degradation.
AWQ Usages
Quantization
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/transformers/main/en/quantization#awq

Seonglae Cho