Transformer Debugger
Component can be an attention head or neuron, or autoencoder latent
Each node only exists in one forward/backward pass. If you modify the prompt and rerun, that would create different nodes
Residual stream stands for
Circuit is a set of nodes that work together to perform some behavior/reasoning.
TDB Usages
Terms
github.com
https://github.com/openai/transformer-debugger/blob/main/terminology.md
1. TDB Intro
In this video, I will walk you through Transformer Debugger, a tool developed to perform exploratory analyses on activations of Transformer language models. Similar to a Python debugger, Transformer Debugger allows you to step through language model outputs, trace important activations, and analyze upstream activations. I will explain how to use prompts, view model outputs, and interpret the node table.
https://www.loom.com/share/721244075f12439496db5d53439d2f84?sid=8445200e-c49e-4028-8b8e-3ea8d361dec0
2. TDB neuron-viewer pages
In this video, I explain how the Transformer Debugger tool provides a prompt-centric view of important activations and model components in a Transformer model. I demonstrate how to navigate the tool and interpret the visualizations, such as the color map indicating attention strength. I also discuss the significance of specific attention heads and MLP neurons, highlighting their role in capturing patterns and making predictions.
https://www.loom.com/share/21b601b8494b40c49b8dc7bfd1dc6829?sid=ee23c00a-9ede-4249-b9d7-c2ba15993556

Seonglae Cho