Only the weights of the model are quantized ahead of time, while the activations are quantized on-the-fly during inference. Weights are statically quantized, but activations are dynamically quantized based on their actual runtime values.
Dynamic Quantization
Creator
Creator
Seonglae ChoCreated
Created
2024 Jan 15 16:17Editor
Editor
Seonglae ChoEdited
Edited
2024 Jan 15 16:18Refs
Refs