Hardware support is required to achieve better performance with quantization on GPUs.
You need a device that supports Tensor Core int8 computation, like T4 or A100.
ONNX Runtime leverages the TensorRT Execution Provider for quantization on GPU.
Quantize ONNX models
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu

Seonglae Cho