물리법칙과 공간을 이해한다는 점에서 3D AI 와 공유되는 부분 많다
Most video foundation models use Masked Autoencoder for self-supervised pre-training but focus on short video sequences (16/32 frames).
Video AI Usages
Video AI Services
generate high-quality videos from text or images for model training