NVIDIA Cosmos Predict

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 7 14:19
Editor
Edited
Edited
2026 Feb 9 17:49
ODE-based continuous time generation +
UniPC
solver used, not DDPM /
Score matching

Cosmos Transfer

built on top of Cosmos-Predict2.5,
cosmos-transfer2.5
nvidia-cosmosUpdated 2026 Jan 7 2:8
transfering data format between image/
Pointcloud
/video etc
Roles: Sim → Real, Real → Real. Training data augmentation and generation, conditional future prediction based on actions
 
 

Cosmos Policy (fine-tuned predict)

Method and results for single-stage fine-tuning of Cosmos-Predict2 for visuomotor control + planning. Based on Cosmos-Predict2-2B-Video2World, so unlike 2.5, it does not use flow matching. imitation learning + supervised fine-tuning policy

Latent frame injection

Robot proprioception / action chunk / value are inserted as "image frame-like" latent frames into the sequence, allowing the video model's existing diffusion learning mechanism to jointly model all modalities.
Model outputs: (1) action chunk, (2) future state (image + proprioception), (3) value (expected reward of future state) generated simultaneously.
models
 
 

Recommendations