The decomposition should identify a set of components that sum to the parameters of the original network
aclanthology.org
https://aclanthology.org/2022.acl-long.345.pdf
Comprehensiveness & Plausibility
aclanthology.org
https://aclanthology.org/2020.acl-main.408.pdf
Transformer Circuit Evaluation Metrics Are Not Robust
Mechanistic interpretability work attempts to reverse engineer the learned algorithms present inside neural networks. One focus of this work has been to discover 'circuits' - subgraphs of the full...
https://openreview.net/forum?id=zSf8PJyQb2#discussion


Seonglae Cho