CorrSteer Abstract

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 18 14:42
Editor
Edited
Edited
2025 Feb 18 14:42
Refs
Refs
  • SAE has a lot of capability of interpretability including steering vector. However overall approach like relying on external LLMs or internal gradient or logit distiribution change lacks considering external use cases. I propose here that extracting specific features using text classification dataset and it actually helps to steer model for that direction. Also I proposed the way to steeringing coefficient with token position awaring and compared performance degradiation with steering vector method and sae decoder nati ve steering. performance degradiation between applying token location and amount of tokens to applying from at the end.
 
 
 
 
 
 
 
 
 

Recommendations