FGAA

Creator
Creator
Seonglae Cho
Created
Created
2025 May 10 22:34
Editor
Edited
Edited
2025 May 10 22:42
Refs
Refs

Feature Guided Activation Additions

A technique that creates interpretable and precise steering vectors through density filtering, BOS removal, top feature selection, and linear approximation optimization. It demonstrates superior behavioral control effects and output consistency across most tasks compared to
CAA
, SAE decoder steering, and
SAE-TS
.
  1. Contrastive in SAE space
  1. Low Density filtering for SAE features
  1. Remove BOS features
  1. Top-k select features
  1. SAE-TS
    like effect approximator
 
 
 
 
 
 

Recommendations