Linear Representation Hypothesis

Created
Created
2024 May 24 4:19
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Dec 13 18:3

LRH

There is significant empirical evidence suggesting that neural networks have interpretable linear directions in activation space.
LRH Notion
 
 
 

2013 Efficient Estimation of Word Representations in Vector Space (Tomas Mikolov, Google)

Tomas Mikolov, Microsoft (Vector Comosition: King - man = Queen)

2022
Residual Stream

Word Embedding

2023
Residual Stream
linearity evidence

ICML
2024 workshop
ICLR
2025 Kiho Park
Gemma
with
Simplex
representation space
Multidimensional feature that lives in subspaces of greater than one dimension is not sufficient to justify non-linear representations.
notion image
notion image
COLM 2024 – The Geometry of Truth
LLMs represent factuality (True/False) linearly in their internal representations. Larger models exhibit a more distinct and generalizable 'truth direction'.
 
 

Recommendations