AI Neural Circuit

Computational subgraph of a neural network.

The limitation of circuit analysis is that it tends to focus only on single circuits or individual mechanisms, and in the case of attention, because it operates as an additive mechanism independently for each head, it is difficult to explain all complex mechanisms through interactions between heads alone. Therefore, the trend is moving towards independent functions of attention heads or SAEs themselves rather than circuits.

AI Neural Circuit Notion

AI Neural Circuit history

QK/OV Circuit

Sparse Feature Circuit

Isolating circuit paths

Circuits Updates - April 2024

We report a number of developing ideas on the Anthropic interpretability team, which might be of interest to researchers working actively in this space. Some of these are emerging strands of research where we expect to publish more on in the coming months. Others are minor points we wish to share, since we're unlikely to ever write a paper about them.

https://transformer-circuits.pub/2024/april-update/index.html#circuit-path-lengths

AI Neural Circuit

Computational subgraph of a neural network.

Isolating circuit paths

Recommendations