Neuronpedia Circuit Tracing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 May 30 12:57
Editor
Edited
Edited
2025 Aug 5 23:55
Refs
Refs
 
 
 

Neuronpedia
Research
circuit-tracer
safety-researchUpdated 2026 Feb 10 14:51

Two-step reasoning (e.g., Dallas→Texas→Austin) actually uses intermediate holes. Language-agnostic reasoning followed by language-specific feature combination. CLT shows better replacement score/sparsity tradeoff compared to PLT, while skip PLT generally offers fewer benefits.
The Circuits Research Landscape: Results and Perspectives - August 2025
A multi-organization interpretability project to replicate and extend circuit tracing research.
The Circuits Research Landscape: Results and Perspectives - August 2025
gemma-2-2b Attribution Graph
Attribution Graphs for Dummies - 1. What are Attribution Graphs?
Part 2: https://youtu.be/hdi1a9MjwDs An introduction to attribution graphs from Anthropic's Circuit Tracing and Model Biology papers, featuring Jack Lindsey (Anthropic), Emmanuel Ameisen (Anthropic), Tom McGrath (Goodfire AI), and Neel Nanda (Google DeepMind). 0:00 Introduction 2:18 Attribution Graph Orientation 19:10 Analyzing an Attribution Graph from Scratch 40:25 Reflection: What have we Learned? Explore Attribution Graphs: https://neuronpedia.org/graph Blog Post: https://www.neuronpedia.org/graph/info circuit-tracer GitHub: https://github.com/safety-research/circuit-tracer Original Papers by Anthropic - Circuit Tracing: https://transformer-circuits.pub/2025/attribution-graphs/methods.html - Biology of an LLM: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Attribution Graphs for Dummies - 1. What are Attribution Graphs?
 
 
 

Recommendations