Temporal Difference

Creator

Creator

Seonglae Cho

Created

Created

2023 Sep 10 7:57

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jul 18 1:24

Refs

Refs

TD learning

Calculates the error (δ) as the difference between "predicted value" and "observed reward + next state value"

State‑value TD (V-learning)

Action‑value TD (
Q learning/
SARSA)

Advantage only isolate the action effect

“bootstrapped” estimation updates immediately at each step based on bootstrapping

Temporal difference target (TD target)

notion image

Supervised regression (TD error)

Backlinks

Tabular Ergodic MDP

Recommendations

/////////