Generalized advantage estimator, N-step return
- Normalizing magnitude of advantage
- for Advantage function std matters more than mean since mean is near 0
- Recently, reward/return normalization is preferred over advantage normalization
- Sign is important in advantage because it determines training, so advantage normalization would change sign of A value.

to find a sweet spot through n-step
- Problem: It’s hard to know which is good for advantage estimation
- Solution: Use exponentially-weighted for future rewards average of n-step returns!
New hyperparameter discounting factor ( typically works well)
Discount factor and GAE’s
The lambda parameter determines a trade-off between more bias (low lambda) and more variance (high lambda).
Setting gamma and lambda in Reinforcement Learning
In any of the standard Reinforcement learning algorithms that use generalized temporal differencing (e.g. SARSA, Q-learning), the question arises as to what values to use for the lambda and gamma h...
https://stackoverflow.com/questions/17336069/setting-gamma-and-lambda-in-reinforcement-learning

Seonglae Cho