Model Compression
Low precision bits mapping reduce memory and model size, Improve inference speed
- Not every layer can be quantized
- Not every model reacts the same way to quantization
Model Quantization Notion
Model Quantization Usages
GPU Memory with quantization
Quantization
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/transformers/main/en/quantization
Quantization
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/optimum/concept_guides/quantization

Seonglae Cho