SAE Finetuning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 3 14:11
Editor
Edited
Edited
2025 Mar 8 12:29
Refs
Refs
 
 
 
 
 
 
 
 
 
 

Gradient-based cleanup

Fine-tuning is applied to the target vector to efficiently reconstruct neuron activation patterns present in the residual using the SAE basis. Through gradient-based cleanup, features with small gradients were removed to create a compact SAE. This shows improved performance compared to the existing task vector and provides interpretability.
Extracting SAE task features for in-context learning — LessWrong
TL;DR * We try to study task vectors in the SAE basis. This is challenging because there is no canonical way to convert an arbitrary vector in the r…
Extracting SAE task features for in-context learning — LessWrong
 
 
 

Recommendations