Relative Entropy, Kullback Leibler Divergence, I-divergence
a metric to compare two distributions (asymmetric metric)
Divergence is large when is small for large (mathematically in fraction)
It means high probability difference make bigger divergence.
- 두 분포가 같다면 0, 그 이외의 경우에는 0보다 크다 (즉 기대한 정보와 결과의 차이 prior, posterior)
- Divergence means just difference
- bigger than 0 due to the Gibbs' inequality
- KL-divergence를 minimize하는 것 또한 결국 log likelihood를 maximize하는 것과 같다
- KL divergence is a Popular Distance
- Cross-entropy에서 entropy를 뺀 값. 거리 개념이 아니다
- In case of KL divergence, we have the correspondence between MLE and KL matching
- has an analytic solution if both p and q follows the normal distribution
q가 p를 커버하도록 학습되며
Analytic KL divergence
Since KL Divergence yields different values depending on the order of comparison, when using KL Loss, we determine the order based on the objective. Closed form is differ from Probability Distribution.
Mode covering - Forward
Maximize when is large for covering distribution by minimizing KL divergence.
Mode seeking - Backward
Maximize when is large which converges into distribution by minimizing reverse KL