Is also H(p)+KL(p|q) and negative log likelihoodposterior prior KL Divergence 를 minimize하는 건 log likelihood를 maximize하는 것과 같다Cross Entropy Notion