Model-based policy optimization

Creator

Seonglae Cho

Created

2024 May 3 3:45

Editor

Seonglae Cho

Edited

2024 May 10 7:34

Refs

Model-based Planning

Option 1: Distill planner’s actions into a policy

No longer compute intensive at test time but still limited to short-horizon problems

Option 2: Plan with terminal value function

Option 3: Augment model-free RL methods with data from model

When model generate full trajectories from initial states, model may not be accurate for long horizons. Also generate partial trajectories from initial states may not get good coverage of later states. Then how about augment data by generating partial trajectories from all states in the data.

Backlinks

Model based RL

Recommendations

////////