SAE Steerability

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 8 21:8
Editor
Edited
Edited
2025 Jan 8 21:13
Refs

How much does the distribution change when adding latent vectors

  1. Calculate a set of related logits by projecting the direction of each latent onto the unembedding matrix
  1. Adjust the scaling factor α in the direction of and add it to the model input
  1. Measure the probability changes (added probabilities) for logits included in
 
 
 
 
 
 
 

Recommendations