Mechanistic interpretability

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 17 13:50
Editor
Edited
Edited
2024 Nov 1 21:53

Fundamental Interpretability, Mech-interp

Attempting to reverse engineer the detailed computations.

One of the core challenges of mechanistic interpretability is to make neural network parameters meaningful by contextualizing them.
Investing in model architecture now may save a lot of interpretability effort in the future.
Mechanistic interpretability Notion
 
 
 
 

Chris Olah

Neel Nanda

Reading list

Overlook

 
 

Recommendations