Hardware support is required to achieve better performance with quantization on GPUs.You need a device that supports Tensor Core int8 computation, like T4 or A100.ONNX Runtime leverages the TensorRT Execution Provider for quantization on GPU. Quantize ONNX modelsONNX Runtime: cross-platform, high performance ML inferencing and training acceleratorhttps://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu