Steering Target Atoms
Traditional steering uses difference vectors, but the representations are entangled, making it prone to side effects (general performance degradation, other properties changing as well). By looking at Amplitude (how strongly it activates) and Frequency (how often it activates), features are selected using threshold magic numbers, with coefficients fixed at 1. STA reconstructs the steering vector using only the selected atoms and injects it. While it didn't improve performance, it was used for safety control with SafeEdit and RealToxicPrompts.

Seonglae Cho