Starting from the output node, sequentially trace along important paths for model operation, analyzing each node's connections by temporarily severing them through Activation Patching. If performance doesn't significantly degrade when a connection is cut, remove that connection. By repeating this process, we extract circuits by keeping only the essential connections while eliminating unnecessary ones.
Greater-Than
NIPS 2023 automated circuit discovery
Self Ablating Transformer using AC/DC ICLR 2025