Neuronpedia Circuit Tracing

Neuronpedia Research
circuit-tracer
decoderesearch • Updated 2026 Jun 25 13:28

Two-step reasoning (e.g., Dallas→Texas→Austin) actually uses intermediate holes. Language-agnostic reasoning followed by language-specific feature combination. CLT shows better replacement score/sparsity tradeoff compared to PLT, while skip PLT generally offers fewer benefits.

The Circuits Research Landscape: Results and Perspectives - August 2025

A multi-organization interpretability project to replicate and extend circuit tracing research.

https://www.neuronpedia.org/graph/info

The Circuits Research Landscape: Results and Perspectives - August 2025

gemma-2-2b Attribution Graph

https://www.neuronpedia.org/gemma-2-2b/graph

Attribution Graphs for Dummies - 1. What are Attribution Graphs?

Part 2: https://youtu.be/hdi1a9MjwDs An introduction to attribution graphs from Anthropic's Circuit Tracing and Model Biology papers, featuring Jack Lindsey (Anthropic), Emmanuel Ameisen (Anthropic), Tom McGrath (Goodfire AI), and Neel Nanda (Google DeepMind). 0:00 Introduction 2:18 Attribution Graph Orientation 19:10 Analyzing an Attribution Graph from Scratch 40:25 Reflection: What have we Learned? Explore Attribution Graphs: https://neuronpedia.org/graph Blog Post: https://www.neuronpedia.org/graph/info circuit-tracer GitHub: https://github.com/safety-research/circuit-tracer Original Papers by Anthropic - Circuit Tracing: https://transformer-circuits.pub/2025/attribution-graphs/methods.html - Biology of an LLM: https://transformer-circuits.pub/2025/attribution-graphs/biology.html

https://www.youtube.com/watch?v=ruLcDtr_cGo

Neuronpedia Circuit Tracing

Neuronpedia Research circuit-tracerdecoderesearch • Updated 2026 Jun 25 13:28

Recommendations

Neuronpedia Research
circuit-tracer
decoderesearch • Updated 2026 Jun 25 13:28