Learn Mechanistic Interpretability

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 6 10:57
Editor
Edited
Edited
2025 Mar 10 15:50
Refs
Refs
 
 
 
 
 

AI researcher starting for interpretability

Reading list

Chris Olah

Neel Nanda

The field of study of reverse engineering neural networks from the learned weights down to human-interpretable algorithms. Analogous to reverse engineering a compiled program binary back to source code.

reaserch

 
 

Recommendations