Temporal Difference

Creator
Creator
Seonglae Cho
Created
Created
2023 Sep 10 7:57
Editor
Edited
Edited
2024 Apr 27 14:17
Refs
Refs

TD learning

“bootstrapped” estimation

Temporal difference target (TD target)

notion image
rti+Vϕπθ(st+1i)r_t^i + V_\phi^{\pi_\theta}(s_{t+1}^i)
 
 
 
 

Supervised regression (TD error)

L(ϕ)=12iVϕπθ(si)yi2L(\phi) = \frac{1}{2}\sum_i||V_\phi^{\pi_\theta}(s^i) - y_i||^2
 
 
 
 
 

Recommendations