Model Compression
Low precision bits mapping reduce memory and model size, Improve inference speed
- Not every layer can be quantized
- Not every model reacts the same way to quantization
Model Quantization Notion
Model Quantization Usages
Seonglae Cho
Seonglae Cho