AI Neural Circuit history

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Oct 14 11:19
Editor
Edited
Edited
2025 Feb 4 17:13
Refs
Refs
 
 
An Introduction to Circuits (OpenAI 2020) Chris Olah
An Introduction to Circuits (OpenAI 2020) Chris Olah
OpenAI claims
Superposition
,
AI Circuit
in above paper with restating
Universality Hypothesis
 
A Mathematical Framework for Transformer Circuits (Anthropic 2021) Nelson Elhage
A Mathematical Framework for Transformer Circuits (Anthropic 2021) Nelson Elhage
 
 
Toward Transparent AI (2022 July) Tilman Rauker
Toward Transparent AI (2022 July) Tilman Rauker
 
INTERPRETABILITY IN THE WILD (2022 Nov) Kevin Wang
INTERPRETABILITY IN THE WILD (2022 Nov) Kevin Wang
  1. Identify all previous names in the sentence (Mary, John, John).
  1. Remove all names that are duplicated (in the example above: John).
  1. Output the remaining name (Mary).
 
 
 

GPT2 circuit analysis

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small — AI Alignment Forum
To learn more about this work, check out the paper. We assume general familiarity with transformer circuits. …
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small — AI Alignment Forum
arxiv.org

Anthropic

Transformer Circuits Thread
Can we reverse engineer transformer language models into human-understandable computer programs? Inspired by the Distill Circuits Thread, we're going to try.

OpenAI

Thread: Circuits
What can we learn if we invest heavily in reverse engineering a single neural network?
Thread: Circuits

One-layer skip trigram

One-layer transformers aren’t equivalent to a set of skip-trigrams — LessWrong
(thanks to Tao Lin and Ryan Greenblatt for pointing this out, and to Arthur Conmy, Jenny Nitishinskaya, Thomas Huck, Neel Nanda, and Lawrence Chan, B…
One-layer transformers aren’t equivalent to a set of skip-trigrams — LessWrong
 
 
 

Recommendations