Analysing the Generalisation and Reliability of Steering Vectors
Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of...
https://openreview.net/forum?id=v8X70gTodR
SER
arxiv.org
https://arxiv.org/pdf/2508.12535

Seonglae Cho