NVIDIA Cosmos Predict

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 7 14:19
Editor
Edited
Edited
2026 Mar 6 16:21
ODE-based continuous time generation +
UniPC
solver used, not DDPM /
Score matching

Cosmos Transfer

built on top of Cosmos-Predict2.5,
cosmos-transfer2.5
nvidia-cosmosUpdated 2026 Jan 7 2:8
transfering data format between image/
Pointcloud
/video etc
Roles: Sim → Real, Real → Real. Training data augmentation and generation, conditional future prediction based on actions
 
 

Cosmos Policy (fine-tuned predict)

Method and results for single-stage fine-tuning of Cosmos-Predict2 for visuomotor control + planning. Based on Cosmos-Predict2-2B-Video2World, so unlike 2.5, it does not use flow matching. imitation learning + supervised fine-tuning policy

Latent frame injection

Robot proprioception / action chunk / value are inserted as "image frame-like" latent frames into the sequence, allowing the video model's existing diffusion learning mechanism to jointly model all modalities.
Model outputs: (1) action chunk, (2) future state (image + proprioception), (3) value (expected reward of future state) generated simultaneously.
arxiv.org
NVIDIA Deep Imagination Research Homepage
NVIDIA's world class researchers and interns work in areas such as AI, deep learning, parallel computing, and more. Explore what's new, learn about our vision of future exascale computing systems.
NVIDIA Deep Imagination Research Homepage
arxiv.org
models
nvidia/Cosmos-Predict2.5-2B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
nvidia/Cosmos-Predict2.5-2B · Hugging Face
nvidia/Cosmos-Predict2.5-14B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
nvidia/Cosmos-Predict2.5-14B · Hugging Face
 
 

Recommendations