Parameter Interpretability
Weights are a vector in parameter space. Attribution is an effect of weight and feature is an effect of representation. The motivation for the weight-similarity is to avoid components sharing param.
- SVD cannot treat Superposition Hypothesis
- NMF also limited to Superposition Hypothesis
Weight Interpretability Notion
Weight Interpretability Methods
Bilinear MLPs
Achille and Soatto (2018) studied the amount of information stored in the weights of deep networks
There is little superposition in parameter space. Linearity in parameter space is a reasonable assumption.