DDPG

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 10 8:50
Editor
Edited
Edited
2024 May 3 14:56
Refs
Refs

Deep deterministic Policy Gradient

Value based,
Actor Critic
continuous version of
DQN
(differentiable actor network helps to evaluate Q values with continuous action space)
  • Not used any more because it is sensitive to hyper parameters
  • DDPG uses target networks for both policy and value function
  • continuous → differentiable → find greedy policy via
 

Off policy actor-critic

Finding greedy policy with continuous actions
notion image
Approximate with a deterministic policy
 
 

Soft target update

Target policy smoothing making exploiting errors in Q-function harder
 
 

How to overcome overestimation weakness of DDPG because of deterministic policy

Deterministic policy can quickly overfit to noisy target Q-function because it overestimating target value
  • Learning 2 Q-functions and choosing the minimum as target (2 is enough)
  • Smoothing target policy
 
 
 

TD3 (Twin Delayed DDPG)

Improves training stability using clipped double q-learning, delayed policy updates and target policy smoothing using clipping
notion image
 
 
 
 
 
 

Recommendations