Learn Mechanistic Interpretability

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 6 10:57
Editor
Edited
Edited
2025 Nov 27 1:51
Refs
Refs
 
 
 
 
 

AI researcher starting for interpretability

Reading list

Chris Olah

Neel Nanda

The field of study of reverse engineering neural networks from the learned weights down to human-interpretable algorithms. Analogous to reverse engineering a compiled program binary back to source code.
history

reaserch

 

Recommendations