AI Neural Circuit

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 2 4:45
Editor
Edited
Edited
2024 Jul 8 8:17
Refs
Refs

Computational subgraph of a neural network.

An Introduction to Circuits (OpenAI 2020) Chris Olah
An Introduction to Circuits (OpenAI 2020) Chris Olah
OpenAI claims
Superposition
,
AI Neural Circuit
in above paper with restating
Universality Hypothesis
 
A Mathematical Framework for Transformer Circuits (Anthropic 2021) Nelson Elhage
A Mathematical Framework for Transformer Circuits (Anthropic 2021) Nelson Elhage
 
 
Toward Transparent AI (2022 July) Tilman Rauker
Toward Transparent AI (2022 July) Tilman Rauker
 
INTERPRETABILITY IN THE WILD (2022 Nov) Kevin Wang
INTERPRETABILITY IN THE WILD (2022 Nov) Kevin Wang
  1. Identify all previous names in the sentence (Mary, John, John).
  1. Remove all names that are duplicated (in the example above: John).
  1. Output the remaining name (Mary).
 
 

QK, OV matric (within single head)

Attention heads can be understood as having two largely independent computations.
The OV and QK matrices are extremely low-rank. Copying behavior is widespread in OV matrices and arguably one of the most interesting behaviors. (for shifting and induction head)
The point to understand about the Circuit is that the tokens are made up of a source and a destination, as follows.
notion image
Previous Token Head (source attention) → Induction head (destination attention)
The attention pattern is a function of both the source and destination token, but once a destination token has decided how much to attend to a source token, the effect on the output is solely a function of that source token.

1. QK Circuit

How each attention head's attention pattern is computed (same pattern matching)
  • preceding tokens → attended token
In fact, information about the attended token itself is quite irrelevant to calculating the attention pattern for induction. Note that the attended token is only ignored when calculating the attention pattern through the QK-circuit. Attended token is extremely important for calculating the head’s output through the OV-circuit! (The parts of the head that calculate the attention pattern, and the output if attended to, are separable and are often useful to consider independently)

2. OV Circuit

Copying is done by the OV ("Output-Value") circuit. 
Transformers seem to have quite a number of copying head (
Attention head
), of which induction heads are a subset.
notion image

Path Expansion Trick for Multi-layer Attention with composition

notion image
More complex QK circuit terms can be used to create induction heads which match on more than just the preceding token. The most basic form of an induction head uses pure K-composition with an earlier “previous token head” to create a QK-Circuit term of the form where has positive
Eigenvalue
s. This term causes the induction head to compare the current token with every earlier position's preceding token and look for places where they're similar. More complex QK circuit terms can be used to create induction heads which match on more than just the preceding token.
Although it is not clearly stated in the paper, in the case of a specific form of single layer, or in the case of multi-layer where the latent space residual stream is altered by token embedding or Q,K-composition, the induction head with a similar
Eigenvector
increases the token distribution probability.

Token Definitions

The QK circuit determines which "source" token the present "destination" token attends back to and copies information from, while the OV circuit describes what the resulting effect on the "out" predictions for the next token is.
[source]... [destination][out]
  • preceding tokens - attention pattern is a function of all possible source tokens from the start to the destination token.
  • source token - attended token is a specific previous token which induction head attended to. Attended token needs to contain information about the preceding tokens from what information is read.
  • destination token - current token where information is written
  • output token - predicted token which are similar with source token after destination token

Composition

  • One layer model copying head: [b] … [a] → [b]
    • And when rare quirks of tokenization allow: [ab] … [a] → [b]
  • Two layer model induction head: [a][b] … [a] → [b]
For the next layer QK-circuit, both Q-composition and K-composition come into play, with previous layer attention heads potentially influencing the construction of the keys and queries
 
 

GPT2 circuit analysis

Anthropic

OpenAI

 
 

Recommendations