KL Divergence

Creator
Creator
Seonglae Cho
Created
Created
2023 Mar 23 2:20
Editor
Edited
Edited
2025 Mar 9 23:59

Relative Entropy, Kullback Leibler Divergence, I-divergence

a metric to compare two distributions (asymmetric metric)

Divergence is large when qq is small for large pp (mathematically in fraction)
It means high probability difference make bigger divergence.
DKL(p(x)q(x))=Exp(x)[logp(x)q(x)]D_{KL} (p(x)||q(x)) = E_{x \sim p(x)} [\log \frac{p(x)}{q(x)}]
  • Equals 0 when two distributions are identical, and greater than 0 otherwise (difference between expected information and result - prior, posterior)
  • Divergence means just difference
  • It diverges to \infty if
    Support
    does not duplicate
  • Minimizing KL-divergence is equivalent to maximizing log likelihood
  • KL divergence is a Popular Distance
  • It is the value of cross-entropy minus entropy. It is not a true distance metric
  • In case of KL divergence, we have the correspondence between MLE and KL matching
  • has an analytic solution if both p and q follows the normal distribution
q is trained to cover p

Analytic KL divergence

Since KL Divergence yields different values depending on the order of comparison, when using KL Loss, we determine the order based on the objective. Closed form is differ from
Probability Distribution
.

Mode covering - Forward KL(pq)KL(p||q)

Maximize qq when pp is large for covering distribution by minimizing KL divergence.

Mode seeking - Backward KL(qp)KL(q||p)

Maximize pp when qq is large which converges into distribution by minimizing reverse KL DKL(q(x)p(x))D_{KL }(q(x) || p(x))
 
 
 
 
 
 

Recommendations