Circuit Hypothesis

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 14 13:46
Editor
Edited
Edited
2025 Jul 14 13:47
Refs
Refs
 
 
 
 
 
Defines three ideal characteristics that circuits should have (performance preservation, role localization, and minimality) and proposes statistical hypothesis testing methods (equivalence, independence, and minimality tests)
Tests the performance difference between circuits and models, the effect of circuit removal, and the presence of unnecessary edges, with flexible 'sufficiency and partial necessity' tests that can adjust difficulty
Applied to 2 synthetic circuits (TRACR-based) and 4 circuits discovered in actual Transformer models (IOI, Induction, etc.)
Synthetic circuits almost fully satisfied the ideal characteristics, but discovered circuits failed to meet all criteria for identical performance, complete localization, and minimality
 
 

Recommendations