Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Reinforcement Learning/Policy Gradient Theorem/
State-value function
Search

State-value function

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jul 18 9:0
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Mar 21 11:34
Refs
Refs
Action-value function

V-function

Expectation of Q-function (
Action-value function
); Expected return starting from a particular state under a given policy.
notion image
notion image
State-value estimations
Policy Gradient Baseline
 
 
 
Value function is distributional expectation of State-value-action function
notion image
 
 
 
(3) 가치함수와 벨만방정식
앞 장에서 문제를 MDP로 정의하는 방식에 대해 살펴보았다. 이제 본격적으로 가치함수와 큐함수, 벨만 기대 방정식과 벨만 최적 방정식에 대해 톺아보자.
(3) 가치함수와 벨만방정식
https://jang-inspiration.com/bellman-equation
(3) 가치함수와 벨만방정식
 

Backlinks

Temporal difference learningTabular Ergodic MDPBellman Expectation EquationPolicy Gradient TheoremActor Critic

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Reinforcement Learning/Policy Gradient Theorem/
State-value function
Copyright Seonglae Cho