Universal Subspace Hypothesis

Universal Weight Subspace Hypothesis

Large-scale experiments have shown that despite differences in initialization, data types, and tasks, models converge to a similar low-dimensional parameter space (subspace) after training. This universal subspace is repeatedly found across various models (e.g., LoRA, ViT, LLaMA), and most of the total weight variance can be explained by very few principal component directions.

www.arxiv.org

https://www.arxiv.org/pdf/2512.05117

Manifold

Deep network training, when viewed not in weight space but in prediction (probability distribution) space, follows nearly the same low-dimensional manifold (set of trajectories) despite vastly different settings like architecture, optimizer, regularization, augmentation, initialization. In other words, "deep learning optimization may be fundamentally low-dimensional in nature"

The training process of many deep networks explores the same low-dimensional manifold | PNAS

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the under...

https://www.pnas.org/doi/10.1073/pnas.2310002121

The training process of many deep networks explores the same low-dimensional manifold | PNAS

Mode Connectivity: Two DNN solutions (SGD solutions trained with different seeds) show high loss when connected via linear interpolation, but when a simple curved path (e.g., 1-bend polygonal chain or quadratic Bézier) is learned and found, the train loss and test error remain nearly constant and low along that path, connecting the two solutions.

arxiv.org

https://arxiv.org/pdf/1802.10026

Universal Subspace Hypothesis

Universal Weight Subspace Hypothesis

Manifold

Backlinks

Recommendations