Q learning

Creator

Creator

Seonglae Cho

Created

Created

2024 Apr 27 10:1

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Apr 27 16:21

Refs

Refs

Q function, policy 둘다 고려하기 어려워 No policy gradient dependency

notion image

Collect by rolling-out any policy

Set target and fit to for each data point

(TD error)

without explicit update on which are defined using

notion image

Properties

Only one Q network to learn, no high-variance policy gradient

하지만 하나 고정해두니 당연히 No convergence guarantees (Moving target) requires lots of tricks to make it work

Evaluating Q-values for all possible actions is infeasible with continuous action space (Could be harder to learn than just a policy)

Value Iteration & Q Iteration

notion image

Q learnings

DQN

Q Learning Notion

Q overestimation

Temporal Difference

Bellman Expectation Equation

Bellman Optimality Equation

Value propagation

Backlinks

Temporal Difference

Recommendations

////////