Using stochastic policy unlike DDPG
A central feature of SAC is entropy regularization. The policy is trained to maximize a trade-off between expected return and entropy. (Actually deterministic policy is 0 variance version of stochastic policy)
최대화하려는 목표에 entropy bonus 항을 추가해서 with temperature hyperparameter which are effective for continuous action space
Epsilon Greedy for discrete, In continuous case, the Entropy means How random the policy is.
Radom policy is preferred in RL because it enables more RL Exploration
Soft Q value
with temperature parameter for entropy

Automatically tuning temperature hyperparameter
Max Ent RL objective is


Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement...
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major...
https://arxiv.org/abs/1801.01290

Soft Actor-Critic — Spinning Up documentation
Soft Actor Critic (SAC) is an algorithm that optimizes a stochastic policy in an off-policy way, forming a bridge between stochastic policy optimization and DDPG-style approaches. It isn’t a direct successor to TD3 (having been published roughly concurrently), but it incorporates the clipped double-Q trick, and due to the inherent stochasticity of the policy in SAC, it also winds up benefiting from something like target policy smoothing.
https://spinningup.openai.com/en/latest/algorithms/sac.html

Seonglae Cho