Circuit Performance Ratio

CPR

The subgraph (circuit) found based on well-performing parts measures how much it recovers the original model performance, which is calculated by integrating faithfulness across various circuit sizes (k%) as an area.

with the ground-truth circuit edges and the edges returned by an interpretability method. CPR is simply the score between true and discovered edges.

MIB (Mechanistic Interpretability Benchmark)

All sets consist of (original, n counterfactuals) pairs, which clearly create situations where "outputs should be the same/should be different."

mib-bench (Mechanistic Interpretability Benchmark)

Principled evaluation of mechanistic interpretability methods.

https://huggingface.co/mib-bench

mib-bench (Mechanistic Interpretability Benchmark)

arxiv.org

https://arxiv.org/pdf/2504.13151

Circuit Performance Ratio

CPR

MIB (Mechanistic Interpretability Benchmark)

Recommendations