finds the parameters θ~MAP maximizing a posteriori distribution
assume θ also has some distribution and find optimal θ
We assume a zero-mean Gaussian prior with covariance Σ for parameters θ
L(θ)=n1∑i=1nl(y,θ)+λC(θ)
with λ≥0 called the regularization parameter and C is a measure of complexity
If we use the Log-likelihood function, a common penalty is to use C(θ)=−logp(θ) where p(θ) is the prior. By setting λ=n1, and ignoring the n1 which does not depend on θ.