Maximum A Posteriori
Intuitively, MLE finds the theta that maximizes the probability of the data, while MAP finds the most probable model given the data. Since MAP includes the prior term when divided by Bayes denominator, it is considered a generalized form of MLE.
- priori mean ‘from the earlier’
- posteriori means ‘from the later’
finds the parameters maximizing a posteriori distribution
assume also has some distribution and find optimal
We assume a zero-mean Gaussian prior with covariance Σ for parameters
with called the regularization parameter and is a measure of complexity
If we use the Log-likelihood function, a common penalty is to use where is the prior. By setting , and ignoring the which does not depend on .
When we use a log form of Bayes Theorem, minimizing this is equivalent to maximizing the log posterior: MAP