Planning with a Latent Dynamics Model
This method involves training a Latent dynamics model and then performing Model-based Planning (MPC/MPPI) on top of it. In other words, instead of directly optimizing a policy network, actions are selected through planning at each timestep within a "learned world model".
We need to train agents that work well across multiple goals and new environments using only offline state-action trajectories without rewards (labels). By using Joint Embedding Predictive Architecture to learn a latent dynamics model followed by planning, we can generate action sequences that minimize the distance between current states and goals in latent space (or redefine costs according to tasks). This allows immediate transfer to various goals, new layouts, and new tasks without reward annotations.