Circuit-based Reasoning Verification
- For each CoT step, construct an Attribution Graph (causal connections between tokens–features–logits).
- Vectorize the graph's statistics, node activations, and topology features into a "structural fingerprint", which serves as input for a Gradient Boosting Classifier to predict whether the step is correct or incorrect.
CRV achieves AUROC above 92 across all datasets including Arithmetic, surpassing existing black/gray-box methods. However, domain specificity is pronounced → error patterns differ across logical, arithmetic, and GSM8K tasks. Among graph features, node activation and influence statistics have the highest predictive power. Directly manipulating error-causing features (suppressing or amplifying activations) actually corrects the reasoning path → confirming causal evidence.

Seonglae Cho