A joint-embedding predictive architecture (JEPA) self-supervised learning approach designed to scale stably "without heuristics" by theoretically defining "the distribution a good embedding should follow" and enforcing that distribution through regularization
Why Gaussian?: Proposes a theory that making the embedding distribution follow an isotropic Gaussian (under certain conditions) is the unique solution that minimizes expected prediction risk, regardless of whether downstream tasks use linear or nonlinear probes
SIGReg: To match the "ideal distribution (= isotropic Gaussian)" above, SIGReg (Sketched Isotropic Gaussian Regularization) is proposed. The approach roughly:
- Sketches embeddings via multiple random 1D projections, and
- Matches the goodness-of-fit so that each 1D distribution aligns with the 1D projection of the target distribution (specifically emphasizing characteristic function-based tests)
LeJEPA objective function: Composed of a simple sum of the existing JEPA "predictive loss between views" plus SIGReg, claiming this prevents collapse without tricks like stop-gradient/teacher-student/EMA schedules/whitening

Seonglae Cho