RL Exploitation

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2024 Jun 18 12:31
Refs
Refs

Make the best decision given current information

Regret - Regret is a measure of your total mistake cost
= the difference between your (expected) rewards, including youthful sub-optimal, and optimal (expected) rewards
notion image
 

Entropy Bonus

 
 
policy evaluation is prediction in policy iteration policy improvement is control in policy iteration
 
 
 

Recommendations