GAE

Created
Created
2024 Mar 20 2:8
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Apr 30 8:1
Refs
Refs

Generalized advantage estimator, N-step return

  • Normalizing magnitude of advantage
  • Recently, reward/return normalization is preferred over advantage normalization
    • Sign is important in advantage because it determines training, so advantage normalization would change sign of A value.
notion image
 
 

to find a sweet spot through n-step

  • Problem: It’s hard to know which is good for advantage estimation
  • Solution: Use exponentially-weighted for future rewards average of n-step returns!
New hyperparameter discounting factor ( typically works well)
 
 
 

Discount factor
and GAE’s

The lambda parameter determines a trade-off between more bias (low lambda) and more variance (high lambda).
 
 
 

Recommendations