TD learning
Calculates the error (δ) as the difference between "predicted value" and "observed reward + next state value"
- State‑value TD (V-learning)
- Action‑value TD (Q learning/SARSA)
- Advantage only isolate the action effect
“bootstrapped” estimation updates immediately at each step based on bootstrapping
Temporal difference target (TD target)

Supervised regression (TD error)
lecture
videolectures.net
https://videolectures.net/videos/deeplearning2017_sutton_td_learning

Seonglae Cho