An Introduction to Circuits (OpenAI 2020) Chris OlahOpenAI claims Superposition, AI Circuit in above paper with restating Universality Hypothesis A Mathematical Framework for Transformer Circuits (Anthropic 2021) Nelson Elhage Toward Transparent AI (2022 July) Tilman Rauker INTERPRETABILITY IN THE WILD (2022 Nov) Kevin WangIdentify all previous names in the sentence (Mary, John, John).Remove all names that are duplicated (in the example above: John).Output the remaining name (Mary). GPT2 circuit analysisSome Lessons Learned from Studying Indirect Object Identification in GPT-2 small — AI Alignment ForumTo learn more about this work, check out the paper. We assume general familiarity with transformer circuits. …https://www.alignmentforum.org/posts/3ecs6duLmTfyra3Gp/some-lessons-learned-from-studying-indirect-objectarxiv.orghttps://arxiv.org/pdf/2211.00593.pdfAnthropicTransformer Circuits ThreadCan we reverse engineer transformer language models into human-understandable computer programs? Inspired by the Distill Circuits Thread, we're going to try.https://transformer-circuits.pub/OpenAIThread: CircuitsWhat can we learn if we invest heavily in reverse engineering a single neural network?https://distill.pub/2020/circuits/One-layer skip trigramOne-layer transformers aren’t equivalent to a set of skip-trigrams — LessWrong(thanks to Tao Lin and Ryan Greenblatt for pointing this out, and to Arthur Conmy, Jenny Nitishinskaya, Thomas Huck, Neel Nanda, and Lawrence Chan, B…https://www.lesswrong.com/posts/b5HNYh9ne5vEkX5ag/one-layer-transformers-aren-t-equivalent-to-a-set-of-skip