Universality Hypothesis

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 6 13:13
Editor
Edited
Edited
2024 Dec 11 17:15
Different models learn similar features and circuits when trained on similar tasks.
Even the discovery of similar circuits in humans and AI supports this claim
 
 
 
 
 

Neuron Activation in
Left Prefrontal cortex
respond to work such as
AI Neuron Activation
(actually word embedding in the paper)

Semantic encoding during language comprehension at single-cell resolution
After matching feature neurons across models via activation correlation, they apply representational space similarity metrics like Singular Value Canonical Correlation. Their experiments reveal similarities in SAE feature spaces across various LLMs, providing evidence for feature universality.

Convergent learning (2016)

A Toy Model of Universality (2023)

evidence

 
 
 

Recommendations