SAE Steering

Creator
Creator
Seonglae Cho
Created
Created
2025 Mar 10 16:8
Editor
Edited
Edited
2025 Aug 9 1:1

Typically applied to the attention sink token or to the last tokens before generation

  • How to find features
  • How much steer those features
  • Which token to steer those features
  • How to apply those features
SAE Feature Steering Methods
 
 
https://www.tilderesearch.com/blog/sieve
 
 

GoodFire-Autosteer-Evaluation
Eitan-SprejerUpdated 2025 May 24 17:28

GoodFire AI
's AutoSteer automatically selects features that best distinguish between control and example datasets in the Dictionary. While this provides more direct and explainable behavior control than prompt manipulation alone, manual selection methods outperformed AutoSteer on Llama-70B in both behavior and consistency.
Steering Vector Coefficient
is stored in the Edit Set, optimizes activations, and uses l1 regression to automatically regularize simultaneously adjusted features. Evaluation was conducted using LLM-as-judge to assess both Behavior and Coherence.
Feature searching
 
 

Recommendations