LF Steering

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Aug 11 9:54
Editor
Edited
Edited
2025 Aug 11 10:2
Refs
Refs

Latent Feature Level Steering

  • Top-1 Influential Layer Identification: Train a classifier to predict consistency using hidden states of paraphrase pairs → Select the layer with highest influence
  • Calculate feature differences between paraphrase pairs that give correct/incorrect answers → Select features above a threshold
 
 
 
 
 
 

Recommendations