LLM.int8()The reason it is called "Decomposition" is that this methodology decomposes the data into two parts during the matrix multiplication process.outlier: fp16non-outlier: fp8 A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytesWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/blog/hf-bitsandbytes-integrationarxiv.orghttps://arxiv.org/pdf/2208.07339arxiv.orghttps://arxiv.org/pdf/2209.04003