Reward to Go

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 20 1:49
Editor
Edited
Edited
2025 May 29 22:47
In
Reward Maximization
, we only add future rewards diagonally forward
This means removing previous rewards from the equation, which is possible because the expectation is 0
However, since the variance is not 0, this has the effect of reducing variance.

Two notation

 

Reward to go

true expected reward to-go
estimated expected reward to-go
Better estimate of Q → better gradient → use true
 
 
 
 
 

Recommendations