Reparameterization trick

파라미터를 바꿔서 미분이 가능하도록 변환하는 기법

Reparameterized gradients can result in significantly lower gradient variance

In traditional stochastic gradient estimation methods like REINFORCE, the variance of the gradient estimate can be high because it relies on sampling from the policy distribution. This introduces a lot of noise into the gradient estimates.

The reparameterization trick (used in methods like SAC with reparameterized gradients) involves expressing the random variable as a deterministic function of a parameterized variable and some noise. This allows gradients to be computed through deterministic paths, reducing the variance associated with the stochasticity.

By reparameterizing, we can control the source of randomness more effectively. This control leads to gradients that are smoother and have lower variance because the noise is introduced in a way that doesn't affect the parameters directly.

sampling 연산을 미분할 수 없어서 backprop을 사용하지 못하는 문제를 해결하기 위해, sampling 연산 과정의 파라미터를 바꿔서 미분이 가능하도록 변환하는 기법

서로 다른 파라미터에 대한 식이기 때문에 sampling만으로 해결이 안되는 경우