Mindmap 3D

nvblox_mindmap
nvidia-isaac • Updated 2026 Jan 13 8:44

nvblox
nvidia-isaac • Updated 2026 Jan 7 12:34
for
TSDF and
ESDF

Robot manipulation policies typically generate actions based only on "what's currently visible in the camera," leading to significant performance degradation on tasks requiring memory of out-of-view objects/goals. mindmap accumulates past observations into a spatial memory via metric-semantic 3D reconstruction (

TSDF +

VFM Feature voxel map) and uses this as a condition for a 3D diffusion policy (trajectory denoising transformer) to generate 3D end-effector trajectories.

RGB-D is processed by VFM (AM-RADIO) into feature maps and back-projected to 3D points using depth.

Simultaneously, TSDF is accumulated using nvblox, VFM features are projected onto voxels → features are attached to mesh vertices to use as reconstruction tokens.

Current observation tokens + reconstruction tokens are processed through separate encoders, then concatenated and denoised via attention to produce trajectories.

For humanoid: includes bimanual control + head yaw control (for exploration/scanning and location memory).

mindmap achieves 76% average success rate, a significant improvement over 3D Diffuser Actor (20%); on humanoid tasks, it outperforms

GR00T N1 by +26%p. The gap with "privileged" settings (external camera, eliminating memory needs) is only 9%p. Limitations: small model/small data/task-specific, keypose extraction is cumbersome, reconstruction is non-differentiable leading to memory overhead for per-voxel feature storage.

Ablations:

Using reconstruction alone degrades performance (current view information helps for pickup, etc.).

Replacing VFM features with RGB causes significant degradation (semantic information is critical).

Temporal blending (EMA) vs. overwrite of features shows negligible performance difference.

arxiv.org

https://arxiv.org/pdf/2509.20297

checkpoint

nvidia/PhysicalAI-Robotics-mindmap-Checkpoints · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/nvidia/PhysicalAI-Robotics-mindmap-Checkpoints

Mindmap 3D

Ablations:

Recommendations