Ambitious interpretability (Theoretical Mechanistic Interpretability)
We want to understand model. Let's decompose activation and components of neural network and do causal analysis to completely understand them. Often takes a theoretical, philosophical, and mathematical approach.
Pragmatic Interpretability
We want to understand model, how can we make models more safer using interpretability techniques. Experiment-based engineering reductionist approach.
Constructive Interpretability
We want to improve the model based on our understanding of interpretability. We know which parts are problematic and which parts contribute to intelligence. How can we leverage this information and change the structure of the model to achieve AGI or better models?

Seonglae Cho