Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/
Faithfulness Interpretability
Search

Faithfulness Interpretability

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 18 0:49
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 6 17:57
Refs
Refs
The decomposition should identify a set of components that sum to the parameters of the original network
 
 
 
 
 
aclanthology.org
https://aclanthology.org/2022.acl-long.345.pdf

Comprehensiveness & Plausibility

aclanthology.org
https://aclanthology.org/2020.acl-main.408.pdf
Circuit Discovery
Transformer Circuit Evaluation Metrics Are Not Robust
Mechanistic interpretability work attempts to reverse engineer the learned algorithms present inside neural networks. One focus of this work has been to discover 'circuits' - subgraphs of the full...
Transformer Circuit Evaluation Metrics Are Not Robust
https://openreview.net/forum?id=zSf8PJyQb2#discussion
Transformer Circuit Evaluation Metrics Are Not Robust
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/
Faithfulness Interpretability
Copyright Seonglae Cho