Generalized min/max

Some features show consistent activations across the top ~60% of the activation spectrum, and then quickly become less interpretable as we look to smaller and smaller activations.
Quantile based steering
Scaling Automatic Neuron Description<!-- --> | Transluce AI
We are releasing a database of descriptions of every neuron inside Llama-3.1-8B-Instruct,
and weights of an explainer model finetuned to produce them.
These descriptions have similar quality to a human expert on automated metrics,
and can be generated inexpensively using an 8B-parameter model.
These high-quality descriptions allow us to query and steer representations in
natural languge, enabling applications such as our observability interface.
https://transluce.org/neuron-descriptions
Monitor: An AI-Driven Observability Interface
This write-up is a technical demonstration, which describes and evaluates the use of a new piece of technology. For technical demonstrations, we still run systematic experiments to test our findings, but do not run detailed ablations and controls. The claims are ones that we have tested and stand behind, but have not vetted as thoroughly as in our research reports.
https://transluce.org/observability-interface


Seonglae Cho