Entropy Regularization scaled by a temperature coefficient
Prevent deterministic policy
Exploration noise in continuous spaces like Epsilon Greedy
Note that maximizing entropy requires differentiating through the sampling distribution. We can do this via the “re-parametrization trick”.