ONNX quantization pre-processing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jun 4 10:9
Editor
Edited
Edited
2023 Jun 4 10:10
Refs
Refs

The goal of these steps is to improve quantization quality.

  1. Symbolic shape inference. This is best suited for transformer models.
  1. Model optimization: This step uses ONNX Runtime native library to rewrite the computation graph, including merging computation nodes, eliminating redundancies to improve runtime efficiency.
    1. Unfortunately, a known issue in ONNX Runtime is that model optimization can not output a model size greater than 2GB. So for large models, optimization must be skipped.
  1. ONNX shape inference.
 
 
 
 
 
 
 

Recommendations