Circuit Stability

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 29 14:30
Editor
Edited
Edited
2025 Jul 29 14:47

The degree to which a language model uses consistent computational circuits across diverse inputs for a task

 
 
 
 

Circuit Stability Characterizes Language Model Generalization

The author uses attention patching for circuit extraction, but unlike previous approaches that use binary hard circuits, they introduce Soft Circuits which make the analysis more tractable and enable numerical analysis. Circuit Importance is measured as the change in KL Loss with circuits that are represented as a graph G = (V, E) (Elhage-style). While basically every edge is connected in the full graph, they prune based on a threshold and observe significant changes in behavior at specific range.
He then creates an adjacency matrix by binarizing the circuit similarity (Spearman ρ) between subtasks. Five circuit families including equal-digit, single-digit, one-digit-diff, etc., naturally separate in the matrix clustering results. Consequently, each group represents a type of problem that reuses a consistent circuit structure.
 
 
 

Recommendations