Steering Vector

Creator
Creator
Seonglae Cho
Created
Created
2024 Apr 25 14:14
Editor
Edited
Edited
2025 Jun 2 0:54

Feature Steering

SAE-based vectors enable more fine-grained control. Because SAE vectors are designed to be sparse, they minimize the impact on other behaviors, allowing for more precise adjustments.
Intervening at later streams produces stronger steering but that modifying the very last residual stream reliably causes broken syntax (Turner, 2024)
 
 
 
 
 
 
 

2016 smiling attribute vector

Computed by simply subtracting the mean vector for images without the smile attribute from the mean vector for images with the smile attribute

Anthropic SAE steering feature vector with limitation and application (2024)

Latent steering vector from 2022 ACL

Style vector

 
 
 

Recommendations