SAE Finetuning

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 3 14:11
Editor
Edited
Edited
2025 Mar 8 12:29
Refs
Refs
 
 
 
 
 
 
 
 
 
 

Gradient-based cleanup

Fine-tuning is applied to the target vector to efficiently reconstruct neuron activation patterns present in the residual using the SAE basis. Through gradient-based cleanup, features with small gradients were removed to create a compact SAE. This shows improved performance compared to the existing task vector and provides interpretability.
 
 
 

Recommendations