NVIDIA Cosmos Predict

ODE-based continuous time generation +

UniPC solver used, not DDPM /

Cosmos Transfer

built on top of Cosmos-Predict2.5, transfering data format between image/

Roles: Sim → Real, Real → Real. Training data augmentation and generation, conditional future prediction based on actions

Cosmos Policy (fine-tuned predict)

Method and results for single-stage fine-tuning of Cosmos-Predict2 for visuomotor control + planning. Based on Cosmos-Predict2-2B-Video2World, so unlike 2.5, it does not use flow matching. imitation learning + supervised fine-tuning policy

Latent frame injection

Robot proprioception / action chunk / value are inserted as "image frame-like" latent frames into the sequence, allowing the video model's existing diffusion learning mechanism to jointly model all modalities.

Model outputs: (1) action chunk, (2) future state (image + proprioception), (3) value (expected reward of future state) generated simultaneously.

arxiv.org

https://arxiv.org/pdf/2601.16163

NVIDIA Deep Imagination Research Homepage

NVIDIA's world class researchers and interns work in areas such as AI, deep learning, parallel computing, and more. Explore what's new, learn about our vision of future exascale computing systems.