LLM.int8()
The reason it is called "Decomposition" is that this methodology decomposes the data into two parts during the matrix multiplication process.
- outlier: fp16
- non-outlier: fp8

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/hf-bitsandbytes-integration

Seonglae Cho