Additive steering becomes unstable when mixed with LayerNorm. Some tokens are affected excessively while others show almost no effect. This breaks coherence, causes entropy collapse, and leads to attention breakdown.
Transformer residuals essentially work as: information = direction, magnitude = confidence/energy. We can change only the information direction while preserving energy. This is a Geodesic update on the unit hypersphere.
Rotation steering = structured redistribution. It moves only within the 2D subspace spanned by h and s, while preserving all other high-dimensional features. In other words, it moves along the existing feature correlation manifold. This is why KL/NBF stability improves and the original distribution is maintained.
arxiv.org
https://arxiv.org/pdf/2602.01716

Seonglae Cho