Universal Subspace Hypothesis

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 9 16:35
Editor
Edited
Edited
2026 Jan 12 17:21
 
 
 
 
 
 

Universal Weight Subspace Hypothesis

Large-scale experiments have shown that despite differences in initialization, data types, and tasks, models converge to a similar low-dimensional parameter space (subspace) after training. This universal subspace is repeatedly found across various models (e.g., LoRA, ViT, LLaMA), and most of the total weight variance can be explained by very few principal component directions.

Manifold

Deep network training, when viewed not in weight space but in prediction (probability distribution) space, follows nearly the same low-dimensional manifold (set of trajectories) despite vastly different settings like architecture, optimizer, regularization, augmentation, initialization. In other words, "deep learning optimization may be fundamentally low-dimensional in nature"
Mode Connectivity: Two DNN solutions (SGD solutions trained with different seeds) show high loss when connected via linear interpolation, but when a simple curved path (e.g., 1-bend polygonal chain or quadratic Bézier) is learned and found, the train loss and test error remain nearly constant and low along that path, connecting the two solutions.
 
 

Recommendations