RL Exploration

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2025 Apr 10 23:42
Refs
Refs

High state entropy (not action entropy)

Gather more information

Simplest: random actions (ε-greedy)
With (small) probability ε, act randomly (if random is small then threshold) → threshold become lower greedly by learning → become zero
With (large) probability 1-ε, act on current policy
  • can keep thrashing around once learning is done
    • One solution: lower εover time (decaying epsilon greedy)
      explore areas whose badness is not (yet) established (optimism for uncertainty)
RL Exploration Notion
 

Methods

  • CBET
    for sparse environment
 
 
 
 
 

Recommendations