Relative Entropy, Kullback Leibler Divergence, I-divergence
a metric to compare two distributions (asymmetric metric)
Divergence is large when is small for large (mathematically in fraction)
It means high probability difference make bigger divergence.
- 두 분포가 같다면 0, 그 이외의 경우에는 0보다 크다 (즉 기대한 정보와 결과의 차이 prior, posterior)
- Divergence means just difference
- bigger than 0 due to the Gibbs' inequality
- KL-divergence를 minimize하는 것 또한 결국 log likelihood를 maximize하는 것과 같다
- KL divergence is a Popular Distance
- Cross-entropy에서 entropy를 뺀 값. 거리 개념이 아니다
- In case of KL divergence, we have the correspondence between MLE and KL matching
- has an analytic solution if both p and q follows the normal distribution
Mode covering
Maximize when is large for covering distribution by minimizing KL divergence.
Mode seeking
Maximize when is large which converges into distribution by minimizing reverse KL