Most commonly used in practice
How good advantage is an action compared to the policy?
Q - V
positive or negative


- usually average is near 0
Advantage estimations
Can show that advantage-weighted objective approximates KL-constrained objective.

Seonglae Cho
Seonglae Cho

