Boltzmann exploration

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Nov 26 13:38
Editor
Edited
Edited
2025 Nov 26 13:41
Refs
Refs

Softmax Action Selection

An exploration method that converts each action's Q-value into a softmax probability distribution for selection.
 

MaxInfoRL

Include the information gain I(s,a) within the Boltzmann distribution. In other words, the policy itself is optimized to "select actions with higher information more frequently." This is not reward shaping like curiosity maximization. Q and I with α1, α2: trade-off between the two terms → auto-tuning
 
 
 
 
 
 
 

Recommendations