Circuit-Model Distance

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 13 22:24
Editor
Edited
Edited
2025 Jun 13 22:40

CMD

Circuit Model Distance (CMD) measures how similar the output distribution of a circuit is to the full model by calculating the area between the CPR curve and f = 1 (where 0 is optimal).
where are the low-level nodes not in the candidate circuit. CMD measures the proportion of test-time outputs that change when each non-circuit node is individually resample-ablated.
 
 
 

MIB (Mechanistic Interpretability Benchmark)

All sets consist of (original, n counterfactuals) pairs, which clearly create situations where "outputs should be the same/should be different."
 
 

Recommendations