There is significant empirical evidence suggesting that neural networks have interpretable linear directions in activation space.
Linear representation hypothesis
Creator
Creator
Seonglae ChoCreated
Created
2024 May 24 4:19Editor
Editor
Seonglae ChoEdited
Edited
2024 May 25 3:14Refs