Generalized advantage estimator, N-step return
- Normalizing magnitude of advantage
- for Advantage function std matters more than mean since mean is near 0
- Recently, reward/return normalization is preferred over advantage normalization
- Sign is important in advantage because it determines training, so advantage normalization would change sign of A value.
to find a sweet spot through n-step
- Problem: It’s hard to know which is good for advantage estimation
- Solution: Use exponentially-weighted for future rewards average of n-step returns!
New hyperparameter discounting factor ( typically works well)
Discount factor and GAE’s
The lambda parameter determines a trade-off between more bias (low lambda) and more variance (high lambda).