Defines three ideal characteristics that circuits should have (performance preservation, role localization, and minimality) and proposes statistical hypothesis testing methods (equivalence, independence, and minimality tests)
Tests the performance difference between circuits and models, the effect of circuit removal, and the presence of unnecessary edges, with flexible 'sufficiency and partial necessity' tests that can adjust difficulty
Applied to 2 synthetic circuits (TRACR-based) and 4 circuits discovered in actual Transformer models (IOI, Induction, etc.)
Synthetic circuits almost fully satisfied the ideal characteristics, but discovered circuits failed to meet all criteria for identical performance, complete localization, and minimality