Gradient-based cleanup
Fine-tuning is applied to the target vector to efficiently reconstruct neuron activation patterns present in the residual using the SAE basis. Through gradient-based cleanup, features with small gradients were removed to create a compact SAE. This shows improved performance compared to the existing task vector and provides interpretability.
Extracting SAE task features for in-context learning — LessWrong
TL;DR * We try to study task vectors in the SAE basis. This is challenging because there is no canonical way to convert an arbitrary vector in the r…
https://www.lesswrong.com/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning

Seonglae Cho