Cosmos Transfer
built on top of Cosmos-Predict2.5, transfering data format between image/Pointcloud/video etc
cosmos-transfer2.5
nvidia-cosmos • Updated 2026 Jan 7 2:8
- ‣
- ‣
Roles: Sim → Real, Real → Real. Training data augmentation and generation, conditional future prediction based on actions
Cosmos Policy (fine-tuned predict)
Method and results for single-stage fine-tuning of Cosmos-Predict2 for visuomotor control + planning. Based on Cosmos-Predict2-2B-Video2World, so unlike 2.5, it does not use flow matching. imitation learning + supervised fine-tuning policy
Latent frame injection
Robot proprioception / action chunk / value are inserted as "image frame-like" latent frames into the sequence, allowing the video model's existing diffusion learning mechanism to jointly model all modalities.
Model outputs: (1) action chunk, (2) future state (image + proprioception), (3) value (expected reward of future state) generated simultaneously.
NVIDIA Deep Imagination Research Homepage
NVIDIA's world class researchers and interns work in areas such as AI, deep learning, parallel computing, and more. Explore what's new, learn about our vision of future exascale computing systems.
https://research.nvidia.com/labs/dir/cosmos-predict2.5/

models
nvidia/Cosmos-Predict2.5-2B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/nvidia/Cosmos-Predict2.5-2B
nvidia/Cosmos-Predict2.5-14B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/nvidia/Cosmos-Predict2.5-14B

Seonglae Cho