Evidence of Lower BOund Variational lower bound shows ELBO is lower bound of marginal likelihood i.e.
p θ ( x ) p_\theta(x) p θ ( x ) marginal log likelihood. If we maximize ELBO, we can approximately maximize marginal likelihood
p θ ( x ) p_\theta(x) p θ ( x ) .
We cannot minimize the KL exactly in most cases, but we can minimize a function that mimics its behaviors. Thisis the evidence of lower bound.
log p θ ( x ) = − KL ( q ϕ ( z ∣ x ) ∥ p θ ( z ∣ x ) ) + ELBO ( q ϕ ( z ∣ x ) ) ⇒ log p θ ( x ) ≥ ELBO ( q ϕ ( z ∣ x ) ) \log p_\theta(x) = -\text{KL}(q_\phi(z|x) \| p_\theta(z|x)) + \text{ELBO}(q_\phi(z|x)) \\\Rightarrow
\log p_\theta(x) \geq \text{ELBO}(q_\phi(z|x)) log p θ ( x ) = − KL ( q ϕ ( z ∣ x ) ∥ p θ ( z ∣ x )) + ELBO ( q ϕ ( z ∣ x )) ⇒ log p θ ( x ) ≥ ELBO ( q ϕ ( z ∣ x )) Minimize
KL Divergence between
P ( x ∣ θ ) P(x|\theta) P ( x ∣ θ ) and
Q ( x ) Q(x) Q ( x ) 좌항과 우항으로 나뉘는데, 좌항으로 kl을 최소화하여 분포사이 거리를 좁히고 우항은 z|x 상황에서 x|z log likelihood를 최대화해서 reconstruction 가능하게 한다
ELBO ( q ϕ ( z ∣ x ) ) = E q ϕ ( z ∣ x ) [ log p θ ( z , x ) ] − E q ϕ ( z ∣ x ) [ log q ϕ ( z ∣ x ) ] = E q ϕ ( z ∣ x ) [ − log q ϕ ( z ∣ x ) + log p θ ( z , x ) ] = − KL ( q ϕ ( z ∣ x ) ∥ p θ ( z ) ) + E q ϕ ( z ∣ x ) [ log p θ ( x ∣ z ) ] \text{ELBO}(q_\phi(z|x)) \\ = \mathbb{E}_{q_\phi(z|x)} \left[ \log p_\theta(z, x) \right] - \mathbb{E}_{q_\phi(z|x)} \left[ \log q_\phi(z|x) \right] \\
= \mathbb{E}_{q_\phi(z|x)} \left[ - \log q_\phi(z|x) + \log p_\theta(z, x) \right]
\\
= - \text{KL}(q_\phi(z|x) \| p_\theta(z)) + \mathbb{E}_{q_\phi(z|x)} \left[ \log p_\theta(x|z) \right] ELBO ( q ϕ ( z ∣ x )) = E q ϕ ( z ∣ x ) [ log p θ ( z , x ) ] − E q ϕ ( z ∣ x ) [ log q ϕ ( z ∣ x ) ] = E q ϕ ( z ∣ x ) [ − log q ϕ ( z ∣ x ) + log p θ ( z , x ) ] = − KL ( q ϕ ( z ∣ x ) ∥ p θ ( z )) + E q ϕ ( z ∣ x ) [ log p θ ( x ∣ z ) ] E L B O ( x , Q , θ ) = Σ z Q ( z ) l o g p ( x , z ; θ ) Q ( z ) ELBO(x, Q, \theta) = \Sigma_zQ(z)log\frac{p(x,z;\theta)}{Q(z)} E L BO ( x , Q , θ ) = Σ z Q ( z ) l o g Q ( z ) p ( x , z ; θ ) That is,
l o g p ( x ; θ ) ≥ E L B O ( x , Q , θ ) , ∀ Q , θ , x log \; p(x;\theta) \ge ELBO(x,Q,\theta), \forall Q, \theta, x l o g p ( x ; θ ) ≥ E L BO ( x , Q , θ ) , ∀ Q , θ , x