RL Exploitation

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2025 Aug 25 17:18
Refs
Refs

Make the best decision given current information

Regret - Regret is a measure of your total mistake cost
= the difference between your (expected) rewards, including youthful sub-optimal, and optimal (expected) rewards
notion image
 

Entropy Bonus

 
 
policy evaluation is prediction in policy iteration policy improvement is control in policy iteration
 
 
 

Recommendations