Cross-Entropy Method
Sampling-based optimization
Sample mean and variance of elite samples are minimizing CE between the current sampling distribution and target distribution
CEM Iteration (more planning → better performance) 5 is a good point.
Set action distribution and update distribution parameters re-fit distribution using top-K elite actions
CEM can generate the action only by planning through the model, without any explicit policy.
Random shooting
Guess & check without elite fitting. We improved this distribution by the selected elites.