Chris Olah

Created
Created
2024 Apr 19 8:35
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Dec 13 17:39
Refs
Refs

Christopher Olah

Ilya Sutskever
of
Anthropic AI
, Future Novel Laureate
 
 
 
 
How this weight correspond to this algorithm.
Mechanistic Interpretability explained | Chris Olah and Lex Fridman
Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=ugvHCXCOmm4 Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/cv8247-sb See below for guest bio, links, and to give feedback, submit questions, contact Lex, etc. *GUEST BIO:* Dario Amodei is the CEO of Anthropic, the company that created Claude. Amanda Askell is an AI researcher working on Claude's character and personality. Chris Olah is an AI researcher working on mechanistic interpretability. *CONTACT LEX:* *Feedback* - give feedback to Lex: https://lexfridman.com/survey *AMA* - submit questions, videos or call-in: https://lexfridman.com/ama *Hiring* - join our team: https://lexfridman.com/hiring *Other* - other ways to get in touch: https://lexfridman.com/contact *EPISODE LINKS:* Claude: https://claude.ai Anthropic's X: https://x.com/AnthropicAI Anthropic's Website: https://anthropic.com Dario's X: https://x.com/DarioAmodei Dario's Website: https://darioamodei.com Machines of Loving Grace (Essay): https://darioamodei.com/machines-of-loving-grace Chris's X: https://x.com/ch402 Chris's Blog: https://colah.github.io Amanda's X: https://x.com/AmandaAskell Amanda's Website: https://askell.io *SPONSORS:* To support this podcast, check out our sponsors & get discounts: *Encord:* AI tooling for annotation & data management. Go to https://lexfridman.com/s/encord-cv8247-sb *Notion:* Note-taking and team collaboration. Go to https://lexfridman.com/s/notion-cv8247-sb *Shopify:* Sell stuff online. Go to https://lexfridman.com/s/shopify-cv8247-sb *BetterHelp:* Online therapy and counseling. Go to https://lexfridman.com/s/betterhelp-cv8247-sb *LMNT:* Zero-sugar electrolyte drink mix. Go to https://lexfridman.com/s/lmnt-cv8247-sb *PODCAST LINKS:* - Podcast Website: https://lexfridman.com/podcast - Apple Podcasts: https://apple.co/2lwqZIr - Spotify: https://spoti.fi/2nEwCF8 - RSS: https://lexfridman.com/feed/podcast/ - Podcast Playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 - Clips Channel: https://www.youtube.com/lexclips *SOCIAL LINKS:* - X: https://x.com/lexfridman - Instagram: https://instagram.com/lexfridman - TikTok: https://tiktok.com/@lexfridman - LinkedIn: https://linkedin.com/in/lexfridman - Facebook: https://facebook.com/lexfridman - Patreon: https://patreon.com/lexfridman - Telegram: https://t.me/lexfridman - Reddit: https://reddit.com/r/lexfridman
Mechanistic Interpretability explained | Chris Olah and Lex Fridman

CV

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning
"Neural network parameters can be thought of as compiled computer programs. Somehow, they encode sophisticated algorithms, capable of things no human knows how to write a computer program to do. Mechanistic interpretability seeks to reverse engineer neural networks into human understandable algorithms. Previous work has tended to focus on vision models; this talk will explore how we might reverse engineer transformer language models.  In particular, we'll focus on what we call ""induction head circuits"", a mechanism that appears to be significantly responsible for in-context learning. Using a pair of attention heads, these circuits allow models to repeat text from earlier in the context, translate text seen earlier, mimic functions from examples earlier in the context, and much more. The discovery of induction heads in the learning process appears to drive a sharp phase change, creating a bump in the loss curve, pivoting models learning trajectories, and greatly increasing their capacity for in-context learning, in the span of just a few hundred training steps." Chris Olah is a co-founder of Anthropic, an AI company focused on the safety of large models, where he leads Anthropic's interpretability efforts. Previously, Chris led OpenAI's interpretability team, and was a researcher at Google Brain. Chris' work includes the Circuits project, his blog (especially his tutorial on LSTMs), the Distill journal, and DeepDream. View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM
Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning
 
 

Recommendations