V-JEPA

Video JEPA

More specifically, the surprise at time is defined as:

Here, is the context encoder, is the target encoder, and is the predictor. This metric is used to evaluate physical plausibility within the Violation-of-Expectation (VoE) framework.

Violation-of-Expectation(VoE)

A method to measure understanding by how much the model is surprised when predictions are violated. Evaluation protocol: when expectations differ from reality, surprise increases.

Attentive probe

Takes the entire token sequence / feature map output from the encoder as input, performs cross-attention on top of it using a single learnable query to select and gather task-relevant information from the tokens, and sends the result through an MLP + classifier.

The attentive probe can focus more attention on areas with motion, hand-object interactions, and patches important for action classification.

Attentive probe = a lightweight evaluation head that uses learnable query-based cross-attention to read and gather important information from token features produced by a frozen encoder for classification.

V-JEPA: The next step toward advanced machine intelligence

We’re releasing the Video Joint Embedding Predictive Architecture (V-JEPA) model, a crucial step in advancing machine intelligence with a more grounded understanding of the world.

https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/

V-JEPA: The next step toward advanced machine intelligence

We’re releasing the Video Joint Embedding Predictive Architecture (V-JEPA) model, a crucial step in advancing machine intelligence with a more grounded understanding of the world.

https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/

Intuitive physics emerges solely from latent space prediction. Predict the future using latent representations learned from video → prediction error = world model

arxiv.org

https://arxiv.org/pdf/2502.11831

V-JEPA

Video JEPA

Violation-of-Expectation(VoE)

Attentive probe

Recommendations