Vision SAE

2023~

https://sae-explorer.streamlit.app/

NSFW Features (

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers — LessWrong

Executive Summary In this post I present my results from training a Sparse Autoencoder (SAE) on a CLIP Vision Transformer (ViT) using the ImageNet-1k…

https://www.lesswrong.com/posts/bCtbuWraqYTDtuARg/towards-multimodal-interpretability-learning-sparse-2

SAE analysis with fine tuning multimodal model (2025)

Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering

Multimodal LLMs have reached remarkable levels of proficiency in understanding multimodal inputs, driving extensive research to develop increasingly powerful models. However, much less attention has been paid to understanding and explaining the underlying mechanisms of these models. Most existing explainability research examines these models only in their final states, overlooking the dynamic representational shifts that occur during training. In this work, we systematically analyze the evolution of hidden state representations to reveal how fine-tuning alters the internal structure of a model to specialize in new multimodal tasks. Using a concept-based approach, we map hidden states to interpretable visual and textual concepts, enabling us to trace changes in encoded concepts across modalities as training progresses. We also demonstrate the use of shift vectors to capture these concepts changes. These shift vectors allow us to recover fine-tuned concepts by shifting those in the original model. Finally, we explore the practical impact of our findings on model steering, showing that we can adjust multimodal LLMs behaviors without any training, such as modifying answer types, captions style, or biasing the model toward specific responses. Our work sheds light on how multimodal representations evolve through fine-tuning and offers a new perspective for interpreting model adaptation in multimodal tasks. The code for this project is publicly available at https://github.com/mshukor/xl-vlms.

https://arxiv.org/html/2501.03012v1

CLIP Vision Transformer

A suite of Vision Sparse Autoencoders — LessWrong

CLIP-Scope? Inspired by Gemma-Scope We trained 8 Sparse Autoencoders each on 1.2 billion tokens on different layers of a Vision Transformer. These (a…

https://www.lesswrong.com/posts/wrznNDMRmbQABAEMH/a-suite-of-vision-sparse-autoencoders

When normalized by the number of patches, it was observed that a similar number of ViT features as language models, and there are claims that SAE reinsertion reduces loss and eliminates noise.

ViT-Prisma

Prisma-Multimodal • Updated 2025 Oct 8 23:16

arxiv.org

https://arxiv.org/pdf/2504.19475

SemanticLens 1.1

https://semanticlens.hhi-research-insights.de/umap-view

Steering

arxiv.org

https://arxiv.org/pdf/2410.22366

Vision SAE

2023~

Steering

Backlinks

Recommendations