We can calculate model complexity by compute norm of parameters
How can we retain the benefits of both underfitting and overfitting? Take complexity as a part of cost. It limits the influence of individual point.
Augmented Error is the sum of how badly the model fits and complexity of model (Bias-Variance Trade-off ). Model regularization mitigates the effect of single data point.
with called the regularization parameter and is a measure of complexity
If we use the Log-likelihood function, a common penalty is to use where is the prior. By setting , and ignoring the which does not depend on .
When we use a log form of Bayes Theorem, minimizing this is equivalent to maximizing the log posterior: MAP
Model Regularization Notion