Soft actor-critic (SAC)

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 10 7:54
Editor
Edited
Edited
2025 Feb 28 10:49
Refs
Refs

Using stochastic policy unlike
DDPG

A central feature of SAC is entropy regularization. The policy is trained to maximize a trade-off between expected return and entropy. (Actually deterministic policy is 0 variance version of stochastic policy)
최대화하려는 목표에 entropy bonus 항을 추가해서 with temperature hyperparameter which are effective for continuous action space
Epsilon Greedy
for discrete, In continuous case, the Entropy means How random the policy is.
Radom policy is preferred in RL because it enables more
RL Exploration
 
 

Soft Q value

with temperature parameter for entropy
notion image
 

Automatically tuning temperature hyperparameter

Max Ent RL objective is
notion image
 
 
 
 
notion image
 
 
 
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement...
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major...
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement...
Soft Actor-Critic — Spinning Up documentation
Soft Actor Critic (SAC) is an algorithm that optimizes a stochastic policy in an off-policy way, forming a bridge between stochastic policy optimization and DDPG-style approaches. It isn’t a direct successor to TD3 (having been published roughly concurrently), but it incorporates the clipped double-Q trick, and due to the inherent stochasticity of the policy in SAC, it also winds up benefiting from something like target policy smoothing.
 
 

 

Recommendations