Starting from the output node, sequentially trace along important paths for model operation, analyzing each node's connections by temporarily severing them through Activation Patching. If performance doesn't significantly degrade when a connection is cut, remove that connection. By repeating this process, we extract circuits by keeping only the essential connections while eliminating unnecessary ones.
Greater-Than
NIPS 2023 automated circuit discovery
proceedings.neurips.cc
https://proceedings.neurips.cc/paper_files/paper/2023/file/34e1dbe95d34d7ebaf99b9bcaeb5b2be-Paper-Conference.pdf
Self Ablating Transformer using AC/DC ICLR 2025

Seonglae Cho