Truth Subspace

It has been observed that there exists a low-dimensional linear space that distinguishes between truth and falsehood. The Truth Co-occurrence Hypothesis (TCH) proposes that in real text, true statements tend to co-occur more frequently with other true statements, while false statements co-occur more with false statements.

Memory learning involves rapid Key-Value associative memory formation (fact memorization). Subsequently, a linear axis for distinguishing truth/falsehood gradually emerges. When true, the norm of the vector after layer-norm becomes smaller, increasing the softmax temperature → increased confidence ("sharpening"). Conversely, when false, the norm increases. Thus, layer-norm plays a role in amplifying the truth distinction signal.

This differs from the phenomenon of norm increasing across layers; the above comparison refers to comparisons between tokens within the same layer. Similarly, one of the key reasons attention sink occurs near BOS can be interpreted as "increasing the effective temperature to secure diversity in early tokens."

arxiv.org

https://arxiv.org/pdf/2510.15804

Truth Subspace

Recommendations