Entropy Bonus

Creator
Creator
Seonglae Cho
Created
Created
2024 May 1 4:34
Editor
Edited
Edited
2024 Jun 16 11:10
Refs
Refs

Entropy Regularization scaled by a temperature coefficient β\beta

Prevent deterministic policy

Exploration noise in continuous spaces like
Epsilon Greedy
H(π(s))=Eaπ[logπ(as)]H(\pi(\cdot|s)) =E_{a\sim\pi}[-log\pi(a|s)]
Note that maximizing entropy requires differentiating through the sampling distribution. We can do this via the “re-parametrization trick”.
notion image
 
 
 
 
 
 
 
 
 
 

Recommendations