CPR
The subgraph (circuit) found based on well-performing parts measures how much it recovers the original model performance, which is calculated by integrating faithfulness across various circuit sizes (k%) as an area.
with the ground-truth circuit edges and the edges returned by an interpretability method. CPR is simply the score between true and discovered edges.
MIB (Mechanistic Interpretability Benchmark)
All sets consist of (original, n counterfactuals) pairs, which clearly create situations where "outputs should be the same/should be different."