RL Exploration

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2024 May 22 1:33
Refs
Refs

High state entropy (not action entropy)

Gather more information

Simplest: random actions (ε-greedy)
With (small) probability ε, act randomly (if random is small then threshold) → threshold become lower greedly by learning → become zero
With (large) probability 1-ε, act on current policy
  • can keep thrashing around once learning is done
    • One solution: lower εover time (decaying epsilon greedy)
      explore areas whose badness is not (yet) established (optimism for uncertainty)
RL Exploration Notion
 
 
 
 
 
 
 

Recommendations