Make the best decision given current information
Regret - Regret is a measure of your total mistake cost
= the difference between your (expected) rewards, including youthful sub-optimal, and optimal (expected) rewards
Entropy Bonus
policy evaluation is prediction in policy iteration policy improvement is control in policy iteration