Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
Neuron SAE Implementation
Search

Neuron SAE Implementation

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 31 9:45
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Jun 7 15:56
Refs
Refs
SAE Training
Vision SAE
Audio Model SAE

Code base

Neuron SAE Implementations
Gemma
Scope
GPT2 SAE
Mistral SAE
LLama SAE
CoT SAE
Pythia SAE
 
 
 
 
 

Training hyperparameters

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to reason about the behavior of the entire network. The first step in that program is to identify the correct components to analyze.
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
https://transformer-circuits.pub/2023/monosemantic-features#appendix-hyperparameters
demo
Google Colab
Google Colab
https://colab.research.google.com/drive/17dQFYUYnuKnP6OwQPH9v_GSYUW5aj-Rp?usp=sharing#scrollTo=mJ6bUncxGN2Y
Google Colab
sae
Google Colab
Google Colab
https://colab.research.google.com/drive/1PlFzI_PWGTN9yCQLuBcSuPJUjgHL7GiD#scrollTo=SXLZn776f_2J
Google Colab
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/
Neuron SAE Implementation
Copyright Seonglae Cho