SAE-TS

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 9 16:5
Editor
Edited
Edited
2025 Mar 10 16:9
Refs
Refs
  1. Collect how decoder steering vector affects encoded SAE features.
  1. Train linear predictor that takes decoder steering vector as input and outputs difference of feature vector.
  1. Combine optimized steering vector for target feature.
Activation values can also be used as steering vectors. There are two ways to obtain steering vectors from SAE features: simple decoding and SAE-TS. In both cases, coefficients play an important role. To understand features and efficiently steer LLMs, it is crucial to understand the factors and their patterns that affect SAE feature activation.
 
 
 
 
 
 

Backlinks

FGAAFGAA

Recommendations