Model-based Planning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 2 12:15
Editor
Edited
Edited
2024 May 31 3:49

Sampling based optimization

Planning (search): pick the best one via back propagation

Generating (imaginary) training data (Model rollout) via sampling (gradient free)
  1. Sample H-step action sequences with model
  1. Remember the best action sequence,
 

Gradient based learning

  • Rollout some policy (e.g. random policy) for H steps with model
  • Back propagate through model with the objective
  • Gradient ascent to get better
  1. Version 1: Guess & check (random shooting)
    1. Sample random actions and choose action based on
  1. Version 2:
    CEM Iteration
 
 
 

Can model plan in abstract level for long-horizon tasks?

Only practical for short-horizon problems or very shaped reward functions because it is too compute expensive to make long plans and model is not accurate for long horizons.
 
 
 

Generative RL

What transfers across environments and tasks? Text-to-video (instruction → state imagination → action) generation as an universal planner. No state, no action required, just images and no reward and no task information.

UniPi

Universal Policy
How can we execute plan?
Just train Inverse dynamics for each robot since the dynamics are different. for this, the state is image
But we don’t have real world third person point of view but we hope that they utilized all kind of view point image for generalizing. (depth images are not used for training in this paper tho)
UniPi can synthesize a diverse set of behaviors which satisfy language instructions. Plausible way to extend decision making.

UniSim

Universal simulator
  • - Computationally heavy
  • + Can provide web-scale knowledge like LLMs by leveraging abundant data in various forms
 
 
 
 
 

Recommendations