Shares many aspects with 3D AI in terms of understanding physical laws and spatial relationships
Most video foundation models use Masked Autoencoder for self-supervised pre-training but focus on short video sequences (16/32 frames).
Video AI Usages
Video AI Services
generate high-quality videos from text or images for model training