Entropy Regularization scaled by a temperature coefficient
Prevent deterministic policy
Exploration noise in continuous spaces like Epsilon Greedy
Note that maximizing entropy requires differentiating through the sampling distribution. We can do this via the “re-parametrization trick”.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement...
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major...
https://arxiv.org/abs/1801.01290


Seonglae Cho