Model Regularization

How can we retain the benefits of both underfitting and overfitting? Take complexity as a part of cost. It limits the influence of individual point.

Augmented Error is the sum of how badly the model fits and complexity of model (

Bias-Variance Trade-off ). Model regularization mitigates the effect of single data point.

AugmentedError = HowBadTheModelFits + Complexity OfModel

L = L_{data} + L_{reg}

\mathcal{L}(\theta) = \frac{1}{n}\sum_{i=1}^n l(y, \theta) + \lambda C(\theta)

with

\lambda \ge0

called the regularization parameter and

C

is a measure of complexity

If we use the

Log-likelihood function, a common penalty is to use

C(\theta) = -\log p(\theta)

where

p(\theta)

is the prior. By setting

\lambda = \frac{1}{n}

, and ignoring the

\frac{1}{n}

which does not depend on

\theta

L(\theta) = - \sum_{i=1}^n \log p(Y_i | \theta) + \log p(\theta) = - (\log p(D | \theta) + \log p(\theta)) = - \log p(\theta | D) + \log p(D) = \log p(\theta|\mathcal{D})

When we use a log form of

Bayes Theorem, minimizing this is equivalent to maximizing the log posterior:

Model Regularization Notion