pqmKVrIjJjkKQMhvRslPapP7QzNV2A1I
Pixtral vision transofmrer encoder 16x16
- SAE modeling
- mistral → mistral lg2 or pixtral
- forward activation modifying
- get activation function
- SAE training
- vision text data → cache the activations, text pairs
- activation inference
- sae reconstruction training (train dataset)
- Feature naming
- storing result statistics with sae lens format → mistral api
- activation text pair generation (test, valiation)
- LLM api prompt with high low activation pairs
- save sae as sae lens form
- Steering
- concurrent generation and compare
- chat, inference cli
- chat demo with concurrent
Hackathon Mistral AI Notion
Related Work
I have written the following articles that provide foundational insights guiding the development of this project:
- Reversing Transformer to Understand In-Context Learning with Phase Change, Feature Dimensionality, and Gradient Descent
This article explores how reversing transformers can shed light on in-context learning mechanisms, phase transitions, and feature dimensionality in large language models.
- Superposition Hypothesis for Steering LLM with Sparse Autoencoder
This post discusses how the superposition hypothesis can be applied to steer large language models using sparse autoencoders by isolating and manipulating specific features within the model.
Mistral Large 2
mistralai/Mistral-Large-Instruct-2407 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
Pixtral
Pixtral
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/transformers/main/model_doc/pixtral
Neuronpedia
Neuronpedia
Open Interpretability Platform
https://www.neuronpedia.org/
SAE Mistral
tylercosgrove/mistral-7b-sparse-autoencoder-layer16 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/tylercosgrove/mistral-7b-sparse-autoencoder-layer16
Seonglae Cho