A tool of infrastructure built to enable interpretability research at Anthropic
Running remote model via garçon and Garçon client connected to a model
Hooks
Pytorch Module Hook exists. However, when the model scales beyond a single node, there's no obvious way to translate that workflow.
The basic interface to probes is that you can provide a “probe function”. Probe functions accept two arguments: a “save context” which can be used to save activations or data for later, and the tensor represented at this particular point in the model. Probe functions can return an updated tensor, which will replace the probed tensor in the computation, or can return None to use the original value. (this convention is borrowed from PyTorch hooks).
you need to create a probe, run the forward pass, and then separately retrieve the activations
rmodel.recordings()
returns a dictionary indexed by probe point name.