Parameter Interpretability
Attribution is an effect of weight and feature is an effect of representation
Weights are a vector in parameter space
- SVD cannot treat Superposition Hypothesis
- NMF also limited to Superposition Hypothesis
Weight Interpretability Methods