Circuit Performance Ratio

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 7 15:55
Editor
Edited
Edited
2025 Jul 1 15:54

CPR

The subgraph (circuit) found based on well-performing parts measures how much it recovers the original model performance, which is calculated by integrating faithfulness across various circuit sizes (k%) as an area.
with the ground-truth circuit edges and the edges returned by an interpretability method. CPR is simply the score between true and discovered edges.
 
 
 
 

MIB (Mechanistic Interpretability Benchmark)

All sets consist of (original, n counterfactuals) pairs, which clearly create situations where "outputs should be the same/should be different."
 
 

Recommendations