AWQ

Activation-aware Weight Quantization

AWQ doesn’t quantize all the weights in a model, and instead, it preserves a small percentage of weights that are important for LLM performance. This significantly reduces quantization loss such that you can run models in 4-bit precision without experiencing any performance degradation.

AWQ Usages

AutoAWQ

llm-awq

mit-han-lab • Updated 2023 Dec 9 3:55

Quantization

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/docs/transformers/main/en/quantization#awq

AWQ

Activation-aware Weight Quantization

Backlinks

Recommendations