Fundamental Interpretability, Mech-interp
Attempting to reverse engineer the detailed computations.
One of the core challenges of mechanistic interpretability is to make neural network parameters meaningful by contextualizing them.
Investing in model architecture now may save a lot of interpretability effort in the future.
Mechanistic interpretability Notion