Due to internal correlations within the model, it affects other features as well. This indicates the limitation of Neuron SAE, as it did not strictly achieve Monosemanticity.
Cross-Domain Effects by Feature Steering
Creator
Creator
Seonglae ChoCreated
Created
2024 Dec 1 12:2Editor
Editor
Seonglae ChoEdited
Edited
2024 Dec 1 12:9Refs
Refs