Model-based policy optimization

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 3 3:45
Editor
Edited
Edited
2024 May 10 7:34

Option 1: Distill planner’s actions into a policy

No longer compute intensive at test time but still limited to short-horizon problems
 

Option 2: Plan with terminal value function

 
 

Option 3: Augment model-free RL methods with data from model

When model generate full trajectories from initial states, model may not be accurate for long horizons. Also generate partial trajectories from initial states may not get good coverage of later states. Then how about augment data by generating partial trajectories from all states in the data.
notion image
 
 
 
 
 
 

Recommendations