Boltzmann exploration

Creator

Seonglae Cho

Created

2025 Nov 26 13:38

Editor

Seonglae Cho

Edited

2025 Nov 26 13:41

Refs

Softmax Action Selection

An exploration method that converts each action's Q-value into a softmax probability distribution for selection.

MaxInfoRL

Include the information gain I(s,a) within the Boltzmann distribution. In other words, the policy itself is optimized to "select actions with higher information more frequently." This is not reward shaping like curiosity maximization. Q and I with α1, α2: trade-off between the two terms → auto-tuning

Recommendations

////////