KL Divergence

Relative Entropy, Kullback Leibler Divergence, I-divergence

a metric to compare two distributions (asymmetric metric)

Divergence is large when is small for large (mathematically in fraction)

It means high probability difference make bigger divergence.

Equals 0 when two distributions are identical, and greater than 0 otherwise (difference between expected information and result - prior, posterior)

Divergence means just difference

It diverges to if
Support does not duplicate

bigger than 0 due to the
Gibbs' inequality

Minimizing KL-divergence is equivalent to maximizing log likelihood

KL divergence is a Popular Distance

It is the value of cross-entropy minus entropy. It is not a true distance metric

In case of KL divergence, we have the correspondence between MLE and KL matching

has an analytic solution if both p and q follows the normal distribution

q is trained to cover p

Analytic KL divergence

Since KL Divergence yields different values depending on the order of comparison, when using KL Loss, we determine the order based on the objective. Closed form is differ from

Probability Distribution.

Mode covering - Forward

Maximize when is large for covering distribution by minimizing KL divergence.

Mode seeking - Backward

Maximize when is large which converges into distribution by minimizing reverse KL

KL divergence - 공돌이의 수학정리노트 (Angelo's Math Notes)

KL divergence가 말하는 것: 이상과 현실 간의 괴리prerequisites해당 포스팅의 내용에 대해 이해하시려면 아래의 내용에 대해 알고 오시는 것을 추천드립니다. 정보 엔트로피크로스 엔트로피(cross entropy)본 포스팅의 주제인 KL-divergence에 대해...

https://angeloyeo.github.io/2020/10/27/KL_divergence.html