JEPA
A network architecture that encodes input data (x) and its corresponding target data (y) into embedding vectors (sₓ, sᵧ), then predicts y's embedding based on x's embedding
- Sensory Module extracts environmental states from input data into internal representations
- World Module predicts future states from sensory representations and incorporates latent variables to manage uncertainty
- Cost Module calculates the cost of predicted states using intrinsic costs and a learnable critic
- Actor Module searches for action sequences that minimize cost and decides on actual actions
Joint Embedding Predictive Architectures