GAE

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 20 2:8
Editor
Edited
Edited
2025 Feb 4 10:21
Refs
Refs

Generalized advantage estimator, N-step return

  • Normalizing magnitude of advantage
  • Recently, reward/return normalization is preferred over advantage normalization
    • Sign is important in advantage because it determines training, so advantage normalization would change sign of A value.
notion image
 
 

to find a sweet spot through n-step

  • Problem: It’s hard to know which is good for advantage estimation
  • Solution: Use exponentially-weighted for future rewards average of n-step returns!
New hyperparameter discounting factor ( typically works well)
 
 
 

Discount factor
and GAE’s

The lambda parameter determines a trade-off between more bias (low lambda) and more variance (high lambda).
Setting gamma and lambda in Reinforcement Learning
In any of the standard Reinforcement learning algorithms that use generalized temporal differencing (e.g. SARSA, Q-learning), the question arises as to what values to use for the lambda and gamma h...
Setting gamma and lambda in Reinforcement Learning
 
 
 

Recommendations