SAE Layer Transferability

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 26 22:10
Editor
Edited
Edited
2025 Feb 27 16:10
 
 
 
 
 

SAE Transferability

SAEs (usually) Transferability Between Base and Chat Models
SAEs (usually) Transfer Between Base and Chat Models — LessWrong
This is an interim report sharing preliminary results that we are currently building on. We hope this update will be useful to related research occur…
SAEs (usually) Transfer Between Base and Chat Models — LessWrong

Transfer Learning
across layers

By leveraging shared representations between adjacent layers, training costs and time can be significantly reduced by applying transfer learning instead of training Sparse AutoEncoder (SAE) from scratch. Backward was better than forward, which can be understood as starting with prior knowledge of computation results.
  • forward SAE
  • backward SAE
aclanthology.org
 
 
 

 

Recommendations