Mixed-precision decomposition

Created

Created

2024 Jul 24 14:43

Creator

Creator

Seonglae Cho

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Jul 24 14:46

Refs

Refs

Mixed Precision

LLM.int8()

The reason it is called "Decomposition" is that this methodology decomposes the data into two parts during the matrix multiplication process.

outlier: fp16

non-outlier: fp8

notion image

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

https://huggingface.co/blog/hf-bitsandbytes-integration

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

https://arxiv.org/pdf/2208.07339

https://arxiv.org/pdf/2209.04003

Recommendations

/////////