PLDM

Planning with a Latent Dynamics Model

This method involves training a

Latent dynamics model and then performing

Model-based Planning (MPC/MPPI) on top of it. In other words, instead of directly optimizing a policy network, actions are selected through planning at each timestep within a "learned world model".

We need to train agents that work well across multiple goals and new environments using only offline state-action trajectories without rewards (labels). By using

Joint Embedding Predictive Architecture to learn a latent dynamics model followed by planning, we can generate action sequences that minimize the distance between current states and goals in latent space (or redefine costs according to tasks). This allows immediate transfer to various goals, new layouts, and new tasks without reward annotations.

ICML 2025 Best Paper

arxiv.org

https://arxiv.org/pdf/2502.14819

PLDM

Planning with a Latent Dynamics Model

ICML 2025 Best Paper

Recommendations