Parameter Interpretability
Weights are a vector in parameter space. Attribution is an effect of weight and feature is an effect of representation. The motivation for the weight-similarity is to avoid components sharing param.
- SVD cannot treat Superposition Hypothesis
- NMF also limited to Superposition Hypothesis
Weight Interpretability Notion
Weight Interpretability Methods
Bilinear MLPs
Achille and Soatto (2018) studied the amount of information stored in the weights of deep networks