Feature Steering
SAE-based vectors enable more fine-grained control. Because SAE vectors are designed to be sparse, they minimize the impact on other behaviors, allowing for more precise adjustments.
Intervening at later streams produces stronger steering but that modifying the very last residual stream reliably causes broken syntax (Turner, 2024)
Steering Vector Methods
Feature Steering Notion
2016 smiling attribute vector
Computed by simply subtracting the mean vector for images without the smile attribute from the mean vector for images with the smile attribute