Platonic Representation Hypothesis

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 May 28 16:28
Editor
Edited
Edited
2026 May 28 16:34
Refs
Refs
The “Platonic Representation Hypothesis” posits that as deep learning models scale up, their learned data representations increasingly converge toward a shared, common structure. The work provides empirical evidence that models with different architectures and training objectives—even across different modalities such as vision and language—become more similar in how they measure distances between data points. It argues that this convergence arises because individual models are all learning the same underlying statistical structure of the real world that generates the data we observe.
To measure representational alignment, the authors use a kernel-based methodology. Concretely, they quantify similarity between the kernels induced by two representation functions using a mutual -nearest-neighbor (mNN) metric, defined as:
Here, are feature vectors from models , and denotes the index set of the nearest neighbors of the vector. The paper further explains mathematically that contrastive learning is, in theory, optimized to maximize pointwise mutual information (PMI), which drives convergence toward the same statistical structure regardless of modality.
For empirical validation, they analyze 78 vision models and multiple large language models (LLMs). As model scale increases and performance improves, representational alignment between models increases approximately linearly. In particular, alignment between the vision model DINOv2 and various LLMs is highly correlated with model quality metrics (e.g., OpenWebText bits-per-byte). They also show that this alignment can serve as a predictor of downstream task performance, such as HellaSwag (commonsense reasoning) and GSM8K (math problem solving).
They find that not only models directly trained with language supervision (e.g., CLIP), but also independently trained vision models such as MAE and DINOv2, share substantial representational similarity with LLMs. As language models improve, alignment scores with vision models tend to rise from around ~0.1 to ≥0.16. The fact that mNN scores remain far below the theoretical maximum of 1 (hovering around ~0.16) suggests that full convergence is not yet achieved and that further research is needed.
 
 
 
 
 
The Platonic Representation Hypothesis
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the...
The Platonic Representation Hypothesis
 
 

Recommendations