RL Exploration

Creator

Created

2019 Nov 5 5:18

Editor

Edited

2025 Apr 10 23:42

Refs

Simplest: random actions (ε-greedy)

With (small) probability ε, act randomly (if random is small then threshold) → threshold become lower greedly by learning → become zero

With (large) probability 1-ε, act on current policy

One solution: lower εover time (decaying epsilon greedy)

explore areas whose badness is not (yet) established (optimism for uncertainty)