LeJEPA

A joint-embedding predictive architecture (JEPA) self-supervised learning approach designed to scale stably "without heuristics" by theoretically defining "the distribution a good embedding should follow" and enforcing that distribution through regularization

Why Gaussian?: Proposes a theory that making the embedding distribution follow an isotropic Gaussian (under certain conditions) is the unique solution that minimizes expected prediction risk, regardless of whether downstream tasks use linear or nonlinear probes

SIGReg: To match the "ideal distribution (= isotropic Gaussian)" above, SIGReg (Sketched Isotropic Gaussian Regularization) is proposed. The approach roughly:

Sketches embeddings via multiple random 1D projections, and

Matches the goodness-of-fit so that each 1D distribution aligns with the 1D projection of the target distribution (specifically emphasizing characteristic function-based tests)

LeJEPA objective function: Composed of a simple sum of the existing JEPA "predictive loss between views" plus SIGReg, claiming this prevents collapse without tricks like stop-gradient/teacher-student/EMA schedules/whitening

arxiv.org

https://arxiv.org/pdf/2511.08544

LeJEPA

Recommendations