Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/AI Inference Tool/TGI/
TGI Quantization
Search

TGI Quantization

Creator
Creator
Seonglae Cho
Created
Created
2023 Nov 15 15:17
Editor
Editor
Seonglae Cho
Edited
Edited
2023 Nov 17 9:17
Refs
Refs
Model Quantization
  • quant_linear
  • ExLLaMa
  • Quantizer
text-generation-server quantize ORIGINAL_MODEL_ID NEW_MODEL_ID

simple

--dtype float16 --dtype bfloat16
 
 

Main document and quantization list

github.com
https://github.com/huggingface/text-generation-inference/blob/main/docs/source/conceptual/quantization.md

AWQ

Add AWQ quantization inference support
Updated 2023 Oct 10 7:31

GPTQ

github.com
https://github.com/huggingface/text-generation-inference/tree/main/server/text_generation_server/utils/gptq
 
 

Table of Contents
simpleMain document and quantization listAWQGPTQ

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/AI Inference Tool/TGI/
TGI Quantization
Copyright Seonglae Cho