Temporal Difference

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Sep 10 7:57
Editor
Edited
Edited
2025 Jul 18 1:24
Refs
Refs

TD learning

Calculates the error (δ) as the difference between "predicted value" and "observed reward + next state value"
  • State‑value TD (V-learning)
  • Advantage only isolate the action effect
“bootstrapped” estimation updates immediately at each step based on bootstrapping

Temporal difference target (TD target)

notion image
 

Supervised regression (TD error)

 
 
 
 
 

Recommendations