Action-value function

Creator
Creator
Seonglae Cho
Created
Created
2024 Mar 22 3:8
Editor
Edited
Edited
2025 May 29 22:47

Q function, value function, State-Action-Value function

Qπ(st,at)=t=tTEπ[r(st,at)st,at]Q^\pi(s_t, a_t) = \sum_{t' = t}^T E_\pi[r(s_{t'}, a_{t'}) | s_t, a_t]
In practice, Q value is harder to fit than value function.
  1. linear value function
    1. notion image
      notion image
      + least squares or regression then minimize error
 
notion image
We only need to fit V
Q-Table stores how good each action is in each state
Generalizing states in a Q-table is challenging due to the large number of possible states. The solution is to generalize from previously experienced situations to new, similar situations. This is accomplished using Feature-Based Representations, which is a fundamental concept we frequently encounter in machine learning.
Action-value estimations
 
 
 
 
 
 

Recommendations