Activation Atlases

The word "Atlas" refers to a "map" or "collection" that systematically organizes various parts, visually structuring the internal representations of models (e.g., attention, activation, etc.).

Activation Atlases cut the activation maps of each network layer into patches, cluster these patches, and arrange them on a 2D map using t-SNE/UMAP.

2019 Activation Atlases (Shan Carter)

Introducing Activation Atlases

We’ve created activation atlases (in collaboration with Google researchers), a new technique for visualizing what interactions between neurons can represent. As AI systems are deployed in increasingly sensitive contexts, having a better understanding of their internal decision-making processes will let us identify weaknesses and investigate failures.

https://openai.com/index/introducing-activation-atlases/

SemanticLens

Unlike Activation Atlas which directly uses activation patches, this method analyzes neuron-level patches by cutting top-m neurons using CRP (Concept Relevance Propagation) and embedding them into CLIP's semantic space. Also it provided interpretability metric such as Clarity, Redundancy, Polysemanticity.

arxiv.org

https://arxiv.org/pdf/2501.05398

Great demonstration with

CLIP

Vision Transformer using UMAP visualization

SemanticLens 1.1

https://semanticlens.hhi-research-insights.de/umap-view

Activation Atlases

2019 Activation Atlases (Shan Carter)

SemanticLens

Backlinks

Recommendations