MAP

Maximum A Posteriori

Intuitively, MLE finds the theta that maximizes the probability of the data, while MAP finds the most probable model given the data. Since MAP includes the prior term when divided by Bayes denominator, it is considered a generalized form of MLE.

\hat{\theta}_{MAP} = \argmax_\theta p(\theta | \mathcal{D}) = \argmax_\theta \log p(\theta | \mathcal{D})

priori mean ‘from the earlier’

posteriori means ‘from the later’

finds the parameters

\tilde{\theta}_{MAP}

maximizing a posteriori distribution

assume

\theta

also has some distribution and find optimal

\theta

We assume a zero-mean Gaussian prior with covariance Σ for parameters

\theta

\mathcal{L}(\theta) = \frac{1}{n}\sum_{i=1}^n l(y, \theta) + \lambda C(\theta)

with

\lambda \ge0

called the regularization parameter and

C

is a measure of complexity

If we use the

Log-likelihood function, a common penalty is to use

C(\theta) = -\log p(\theta)

where

p(\theta)

is the prior. By setting

\lambda = \frac{1}{n}

, and ignoring the

\frac{1}{n}

which does not depend on

\theta

L(\theta) = - \sum_{i=1}^n \log p(Y_i | \theta) + \log p(\theta) = - (\log p(D | \theta) + \log p(\theta)) = - \log p(\theta | D) + \log p(D) = \log p(\theta|\mathcal{D})

When we use a log form of

Bayes Theorem, minimizing this is equivalent to maximizing the log posterior:

MAP

MAP

Maximum A Posteriori

Backlinks

Recommendations