The decomposition should identify a set of components that sum to the parameters of the original network
Comprehensiveness & Plausibility
Transformer Circuit Evaluation Metrics Are Not Robust
Mechanistic interpretability work attempts to reverse engineer the learned algorithms present inside neural networks. One focus of this work has been to discover 'circuits' - subgraphs of the full...
https://openreview.net/forum?id=zSf8PJyQb2#discussion


Seonglae Cho