Meta FAIR team proposed IntPhys 2, utilizing Unreal Engine-based photorealistic environments to build a more rigorous evaluation framework
Based on developmental psychology's Violation of Expectation (VoE) paradigm, it evaluates four core physical principles: Object Permanence, Immutability, Spatio-Temporal Continuity, and Solidity. The evaluation metric is pairwise accuracy, which measures whether models assign higher surprise scores to impossible videos in each scene's quadruplet structure (2 possible, 2 impossible). For predictive models, the surprise score is defined as , where $d$ is the distance metric between predicted frame and actual frame
Architecturally, IntPhys 2 consists of 1,416 videos organized into 3 splits: Debug (5 scenes, 60 videos), Main (253 scenes, 1,012 videos across Easy/Medium/Hard difficulty levels), and Held-Out (86 scenes, 344 videos)
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex...
We present IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models. Building on the original IntPhys benchmark, IntPhys 2 focuses on four core...
https://arxiv.org/abs/2506.09849


Seonglae Cho