Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/AI Optimization/Model Quantization/Model Quantization Method/Post-training quantization/
Mixed-precision decomposition
Search

Mixed-precision decomposition

Created
Created
2024 Jul 24 14:43
Creator
Creator
Seonglae Cho
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Jul 24 14:46
Refs
Refs
Mixed Precision

LLM.int8()

The reason it is called "Decomposition" is that this methodology decomposes the data into two parts during the matrix multiplication process.
  • outlier: fp16
  • non-outlier: fp8
notion image
 
 
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
https://huggingface.co/blog/hf-bitsandbytes-integration
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
arxiv.org
https://arxiv.org/pdf/2208.07339
arxiv.org
https://arxiv.org/pdf/2209.04003
 
 

 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/AI Optimization/Model Quantization/Model Quantization Method/Post-training quantization/
Mixed-precision decomposition
Copyright Seonglae Cho