quant_linearExLLaMa Quantizertext-generation-server quantize ORIGINAL_MODEL_ID NEW_MODEL_IDsimple--dtype float16 --dtype bfloat16 Main document and quantization listgithub.comhttps://github.com/huggingface/text-generation-inference/blob/main/docs/source/conceptual/quantization.mdAWQAdd AWQ quantization inference supportUpdated 2023 Oct 10 7:31GPTQgithub.comhttps://github.com/huggingface/text-generation-inference/tree/main/server/text_generation_server/utils/gptq