LRH
There is significant empirical evidence suggesting that neural networks have interpretable linear directions in activation space.
LRH Notion
2013 Efficient Estimation of Word Representations in Vector Space (Tomas Mikolov, Google)
Tomas Mikolov, Microsoft (Vector Comosition: King - man = Queen)
2022 Residual Stream
Word Embedding
2023 Residual Stream linearity evidence
Multidimensional feature that lives in subspaces of greater than one dimension is not sufficient to justify non-linear representations.


COLM 2024 – The Geometry of Truth
LLMs represent factuality (True/False) linearly in their internal representations. Larger models exhibit a more distinct and generalizable 'truth direction'.

Seonglae Cho