ONNX quantization pre-processing

Created
Created
2023 Jun 4 10:9
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Nov 14 1:54
Refs
Refs

The goal of these steps is to improve quantization quality.

  1. Symbolic shape inference. This is best suited for transformer models.
  1. Model optimization: This step uses ONNX Runtime native library to rewrite the computation graph, including merging computation nodes, eliminating redundancies to improve runtime efficiency.
    1. Unfortunately, a known issue in ONNX Runtime is that model optimization can not output a model size greater than 2GB. So for large models, optimization must be skipped.
  1. ONNX shape inference.
 
 
 
 
 
 

Recommendations