It performs the diffusion process in latent space rather than pixel space using Latent Diffusion Model. It uses Diffusion Transformer instead of UNet. One H100 GPU can generate up to 5 minutes of video per hour
Sora AI
Creator
Creator

Created
Created
2024 Feb 16 2:35Editor
Editor

Edited
Edited
2025 Jun 20 14:34