Softmax Action Selection
An exploration method that converts each action's Q-value into a softmax probability distribution for selection.
MaxInfoRL
Include the information gain I(s,a) within the Boltzmann distribution. In other words, the policy itself is optimized to "select actions with higher information more frequently." This is not reward shaping like curiosity maximization. Q and I with α1, α2: trade-off between the two terms → auto-tuning

Seonglae Cho