Goodfire | AI Interpretabilityhttps://platform.goodfire.ai/chat/new?model=70bRemoved approximately 30% and 3.5% of features in our Llama 3.1 8B and Llama 3.3 70B SAEs respectively, eliminating features that were determined to be harmful in the SAE analysis.LLaMa 70bMapping the Latent Space of Llama 3.3 70BWe have trained sparse autoencoders (SAEs) on Llama 3.3 70B and released the interpreted model for general access via an API.https://www.goodfire.ai/papers/mapping-latent-spaces-llamaGoodfire/Llama-3.3-70B-Instruct-SAE-l50 · Hugging FaceWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l50LLaMa for every layer 8bUnderstanding and Steering Llama 3 with Sparse AutoencodersWe present a novel approach to interpreting and controlling large language model behavior with sparse autoencoders, demonstrated through a desktop interface for Llama-3-8B.https://www.goodfire.ai/papers/understanding-and-steering-llama-3Goodfire/Llama-3.1-8B-Instruct-SAE-l19 · Hugging FaceWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19