GPTQ

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jun 7 16:8
Editor
Edited
Edited
2023 Dec 9 6:43

One-shot weight quantization method

A post-training quantization technique where each row of the weight matrix is quantized independently to find a version of the weights that minimizes the error. These weights are quantized to int4, but they’re restored to fp16 on the fly during inference. This can save your memory-usage by 4x because the int4 weights are dequantized in a fused kernel rather than a GPU’s global memory, and you can also expect a speedup in inference because using a lower bitwidth takes less time to communicate.
GPTQ Notion
 
 
GPTQ Usages
 
 
 
GPTQ: Accurate Post-Training Quantization for Generative...
Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high...
GPTQ: Accurate Post-Training Quantization for Generative...
gptq
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
gptq
Making LLMs lighter with AutoGPTQ and transformers
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Making LLMs lighter with AutoGPTQ and transformers
 
 

Recommendations