Latent dynamics model
Too much information → slow & inaccurate, so latent dynamics model only focused on predictive of reward by only retaining information related to reward. Specifically, It does not retain initial state and it only utilize latent state.
That is, instead of learning a new "reward model" without rewards, a task-invariant (agile) cost function that can be easily defined on the latent representation is directly specified.

We fit the latent dynamics model by using
