Circuit Discovery

Causal abstraction

Circuit Discovery Methods

Circuit Tracing

Sparse Feature Circuit

Attention-Causal Communication

L3D

Attribution Patching

AC DC

Feature Cluster Resampling

Circuit Discovery Usage

Attribution Graph

Circuit Performance Ratio

Circuit-Model Distance

InterpBench

Circuit Stability

Zoom In: An Introduction to Circuits

By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.

https://distill.pub/2020/circuits/zoom-in/

curve circuit (2020)

Curve Circuits

Reverse engineering the curve detection algorithm from InceptionV1 and reimplementing it from scratch.

https://distill.pub/2020/circuits/curve-circuits/

Stanford video, Mechanistic interpretability that aims for causal grounding.

Causal Mechanistic Interpretability (Stanford lecture 1) - Atticus Geiger

How can we use the language of causality to understand and edit the internal mechanisms of AI models? Atticus Geiger (Goodfire) gives a guest lecture on applying frameworks and tools from causal modeling to understand LLMs and other neural networks in Surya Ganguli's Stanford course APPPHYS 293. 00:00 - Intro 01:51 - Activation steering (e.g. Golden Gate Claude) 10:23 - Causal mediation analysis (understanding the contribution of an intermediate component) 21:42 - Causal abstraction methods (explaining a complex causal system with a simple one) 26:11 - Interchange interventions 40:46 - Distributed Alignment Search 54:54 - Lookback mechanisms: a case study in designing counterfactuals Read more about our research: https://www.goodfire.ai/research Follow us on X: https://x.com/GoodfireAI

https://www.youtube.com/watch?v=78Xa8VkH7-g&pp=0gcJCU0KAYcqIYzv

It is a common belief that the predictive power of networks leveraging softmax arises from “circuits” which sharply perform certain kinds of computations consistently across many diverse inputs. However, for these circuits to be robust, they would need to generalise well to arbitrary valid inputs. In this paper, we dispel this myth: even for tasks as simple as finding the maximum key, any learned circuitry must disperse as the number of items grows at test time. We attribute this to a fundamental limitation of the softmax function to robustly approximate sharp functions with increasing problem size, prove this phenomenon theoretically.

arxiv.org

https://arxiv.org/pdf/2410.01104

Circuit Discovery

Causal abstraction

Recommendations