Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Steering Vector/
Steering Vector Side Effect
Search

Steering Vector Side Effect

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Aug 20 14:3
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Aug 21 13:11
Refs
Refs
 
 
 
 
 
Analysing the Generalisation and Reliability of Steering Vectors
Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of...
Analysing the Generalisation and Reliability of Steering Vectors
https://openreview.net/forum?id=v8X70gTodR

SER

arxiv.org
https://arxiv.org/pdf/2508.12535
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Steering Vector/
Steering Vector Side Effect
Copyright Seonglae Cho