Entity recognition vector
Using sparse autoencoders (SAE), we discovered linear directions within large language models that distinguish between "known" and "unknown" entities. Increasing the "known" latent increases the likelihood of hallucination, while increasing the "unknown" latent promotes answer refusal. This demonstrates the existence of self-knowledge, which directly influences how the model remembers information and generates incorrect information.
The correlation with simple token probabilities is minimal, suggesting that the discovered latents reflect more sophisticated knowledge recognition beyond mere predictability.