Joint Embedding Predictive Architecture

JEPA

A network architecture that encodes input data (x) and its corresponding target data (y) into embedding vectors (sₓ, sᵧ), then predicts y's embedding based on x's embedding

Sensory Module extracts environmental states from input data into internal representations

World Module predicts future states from sensory representations and incorporates latent variables to manage uncertainty

Cost Module calculates the cost of predicted states using intrinsic costs and a learnable critic

Actor Module searches for action sequences that minimize cost and decides on actual actions

Joint Embedding Predictive Architectures

I-JEPA

V-JEPA

H JEPA

openreview.net

https://openreview.net/pdf?id=BZ5a1r-kVsf

Joint Embedding Predictive Architecture

JEPA

Recommendations