Soft actor-critic (SAC)

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 10 7:54
Editor
Edited
Edited
2024 May 23 4:9
Refs
Refs

Using stochastic policy unlike
DDPG

A central feature of SAC is entropy regularization. The policy is trained to maximize a trade-off between expected return and entropy. (Actually deterministic policy is 0 variance version of stochastic policy)
최대화하려는 목표에 entropy bonus 항을 추가해서 with temperature hyperparameter which are effective for continuous action space
Epsilon Greedy
for discrete, In continuous case, the Entropy means How random the policy is.
Radom policy is preferred in RL because it enables more
RL Exploration
 
 

Soft Q value

with temperature parameter for entropy
notion image
 

Automatically tuning temperature hyperparameter

Max Ent RL objective is
notion image
 
 
 
 
notion image
 
 
 
 
 

 

Recommendations