GoodFire Ember

Creator

Creator

Seonglae Cho

Created

Created

2025 Apr 21 14:33

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jun 2 10:19

Refs

Refs

Goodfire | AI Interpretability

https://platform.goodfire.ai/chat/new?model=70b

Removed approximately 30% and 3.5% of features in our Llama 3.1 8B and Llama 3.3 70B SAEs respectively, eliminating features that were determined to be harmful in the SAE analysis.

LLaMa 70b

Mapping the Latent Space of Llama 3.3 70B

We have trained sparse autoencoders (SAEs) on Llama 3.3 70B and released the interpreted model for general access via an API.

Mapping the Latent Space of Llama 3.3 70B

https://www.goodfire.ai/papers/mapping-latent-spaces-llama

Mapping the Latent Space of Llama 3.3 70B

Goodfire/Llama-3.3-70B-Instruct-SAE-l50 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Goodfire/Llama-3.3-70B-Instruct-SAE-l50 · Hugging Face

https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l50

Goodfire/Llama-3.3-70B-Instruct-SAE-l50 · Hugging Face

LLaMa for every layer 8b

Understanding and Steering Llama 3 with Sparse Autoencoders

We present a novel approach to interpreting and controlling large language model behavior with sparse autoencoders, demonstrated through a desktop interface for Llama-3-8B.

Understanding and Steering Llama 3 with Sparse Autoencoders

https://www.goodfire.ai/papers/understanding-and-steering-llama-3

Understanding and Steering Llama 3 with Sparse Autoencoders

Goodfire/Llama-3.1-8B-Instruct-SAE-l19 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Goodfire/Llama-3.1-8B-Instruct-SAE-l19 · Hugging Face

https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19

Goodfire/Llama-3.1-8B-Instruct-SAE-l19 · Hugging Face

Recommendations

//////