



- Identify all previous names in the sentence (Mary, John, John).
- Remove all names that are duplicated (in the example above: John).
- Output the remaining name (Mary).
GPT2 circuit analysis
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small — AI Alignment Forum
To learn more about this work, check out the paper. We assume general familiarity with transformer circuits. …
https://www.alignmentforum.org/posts/3ecs6duLmTfyra3Gp/some-lessons-learned-from-studying-indirect-object
Anthropic
Transformer Circuits Thread
Can we reverse engineer transformer language models into human-understandable computer programs?
Inspired by the Distill Circuits Thread, we're going to try.
https://transformer-circuits.pub/
OpenAI
Thread: Circuits
What can we learn if we invest heavily in reverse engineering a single neural network?
https://distill.pub/2020/circuits/
One-layer skip trigram
One-layer transformers aren’t equivalent to a set of skip-trigrams — LessWrong
(thanks to Tao Lin and Ryan Greenblatt for pointing this out, and to Arthur Conmy, Jenny Nitishinskaya, Thomas Huck, Neel Nanda, and Lawrence Chan, B…
https://www.lesswrong.com/posts/b5HNYh9ne5vEkX5ag/one-layer-transformers-aren-t-equivalent-to-a-set-of-skip


Seonglae Cho