STA

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 9 22:35
Editor
Edited
Edited
2026 Jan 9 22:40
Refs
Refs

Steering Target Atoms

Traditional steering uses difference vectors, but the representations are entangled, making it prone to side effects (general performance degradation, other properties changing as well). By looking at Amplitude (how strongly it activates) and Frequency (how often it activates), features are selected using threshold magic numbers, with coefficients fixed at 1. STA reconstructs the steering vector using only the selected atoms and injects it. While it didn't improve performance, it was used for safety control with SafeEdit and RealToxicPrompts.
 
 
 
 
 
 

Recommendations