Q learning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 27 10:1
Editor
Edited
Edited
2024 Apr 27 16:21
Refs
Refs

Q function, policy 둘다 고려하기 어려워 No policy gradient dependency

notion image
  1. Collect by rolling-out any policy
  1. Set target and fit to for each data point
    1. (TD error)
  1. without explicit update on which are defined using
    1. notion image

Properties

  • Only one Q network to learn, no high-variance policy gradient
  • 하지만 하나 고정해두니 당연히 No convergence guarantees (Moving target) requires lots of tricks to make it work
  • Evaluating Q-values for all possible actions is infeasible with continuous action space (Could be harder to learn than just a policy)

Value Iteration
& Q Iteration

notion image
Q learnings
 
 
 
Q Learning Notion
 
 
 
 
 
 
 

Recommendations