Chris Olah

Creator

Creator

Created

Created

2024 Apr 19 8:35

Editor

Editor

Edited

Edited

2025 Jan 30 13:12

Refs

Refs

Christopher Olah

Ilya Sutskever of

Future Novel Laureate

CV

colah.github.io

https://colah.github.io/cv.pdf

Home - colah's blog

Writing rough notes allows me share more content, since polishing takes lots of time. While I hope it's useful, it's likely lower quality and less carefully considered than my usual articles.

https://colah.github.io/

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

"Neural network parameters can be thought of as compiled computer programs. Somehow, they encode sophisticated algorithms, capable of things no human knows how to write a computer program to do. Mechanistic interpretability seeks to reverse engineer neural networks into human understandable algorithms. Previous work has tended to focus on vision models; this talk will explore how we might reverse engineer transformer language models. In particular, we'll focus on what we call ""induction head circuits"", a mechanism that appears to be significantly responsible for in-context learning. Using a pair of attention heads, these circuits allow models to repeat text from earlier in the context, translate text seen earlier, mimic functions from examples earlier in the context, and much more. The discovery of induction heads in the learning process appears to drive a sharp phase change, creating a bump in the loss curve, pivoting models learning trajectories, and greatly increasing their capacity for in-context learning, in the span of just a few hundred training steps." Chris Olah is a co-founder of Anthropic, an AI company focused on the safety of large models, where he leads Anthropic's interpretability efforts. Previously, Chris led OpenAI's interpretability team, and was a researcher at Google Brain. Chris' work includes the Circuits project, his blog (especially his tutorial on LSTMs), the Distill journal, and DeepDream. View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

https://www.youtube.com/watch?v=pC4zRb_5noQ

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

Building Anthropic | A conversation with our co-founders

The co-founders of Anthropic discuss the past, present, and future of Anthropic. From left to right: Chris Olah, Jack Clark, Daniela Amodei, Sam McCandlish, Tom Brown, Dario Amodei, and Jared Kaplan. Links and further reading: Anthropic's Responsible Scaling Policy (RSP): https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy Machines of Loving Grace: https://darioamodei.com/machines-of-loving-grace Work with us: https://anthropic.com/careers Claude: https://claude.com 00:00 Why work on AI? 02:08 Scaling breakthroughs 03:30 Early days of AI 10:57 Sentiment shifting 18:30 The Responsible Scaling Policy 30:42 Founding story 32:45 Building a culture of trust 39:08 Racing to the top 43:43 Looking to the future

Building Anthropic | A conversation with our co-founders

https://www.youtube.com/watch?v=om2lIWXLLN4

Building Anthropic | A conversation with our co-founders

Backlinks

AI Industry Learn Mechanistic Interpretability Vision Interpretability Superposition Hypothesis Vision Interpretability

Recommendations

//////