is for reconstruction
is for regularization to make the approximate posterior close to the prior
1. Maximize first term
The first term of ELBO has no analytic solution due to the integral
Decoder Part
We can approximate the expectation with Monte Carlo Method (Approximating the expectation by the sample mean)
we can approximate the derivative w.r.t.
Encoder Part
we cannot approximate the derivative w.r.t. in this case, because the distribution is replaced by its samples. So we use re-parametrization trick (key trick to train VAE)
Re-parametrization Trick
Some random variables can be represented as a function of another variable.
Any normal distribution can be explained by the standard normal distribution. That is, we can also take a sample of normal distribution using the sample from the standard normal distribution.
2. Maximize second term
Minimize KL divergence