Super Weight

Only a small number of model parameters (up to six weights and one activation) have a significant impact on model performance. When these are removed, the model cannot generate text, accuracy drops to random levels, and perplexity surges.

The paper efficiently detects super weights by analyzing activation distributions using a single input prompt.

Super activation

Super weights generate very large activation values at specific positions, which propagate throughout the model.

Application

By preserving super weights and super activations, even simple quantization (round-to-nearest) methods can maintain high quality. The approach of preserving super weights achieves similar performance to existing advanced quantization techniques (like

SmoothQuant) while offering the advantage of not requiring any data.

arxiv.org

https://arxiv.org/pdf/2411.07191

openreview.net

https://openreview.net/pdf?id=1ayU4fMqme

Super Weight

Super activation

Application

Backlinks

Recommendations