Intuitively, MLE finds the theta that maximizes the probability of the data, while MAP finds the most probable model given the data. Since MAP includes the prior term when divided by Bayes denominator, it is considered a generalized form of MLE.
θ^MAP=argmaxθp(θ∣D)=argmaxθlogp(θ∣D)
priori mean ‘from the earlier’
posteriori means ‘from the later’
finds the parameters θ~MAP maximizing a posteriori distribution
assume θ also has some distribution and find optimal θ
We assume a zero-mean Gaussian prior with covariance Σ for parameters θ
L(θ)=n1∑i=1nl(y,θ)+λC(θ)
with λ≥0 called the regularization parameter and C is a measure of complexity
If we use the Log-likelihood function, a common penalty is to use C(θ)=−logp(θ) where p(θ) is the prior. By setting λ=n1, and ignoring the n1 which does not depend on θ.