Model Regularization

Creator
Creator
Seonglae Cho
Created
Created
2023 May 9 2:8
Editor
Edited
Edited
2025 May 25 17:21
Refs
Refs

We can calculate model complexity by compute norm of parameters

How can we retain the benefits of both underfitting and overfitting? Take complexity as a part of cost. It limits the influence of individual point.
Augmented Error is the sum of how badly the model fits and complexity of model (
Bias-Variance Trade-off
). Model regularization mitigates the effect of single data point.
AugmentedError=HowBadTheModelFits+ComplexityOfModelAugmentedError = HowBadTheModelFits + Complexity OfModelL=Ldata+LregL = L_{data} + L_{reg}
L(θ)=1ni=1nl(y,θ)+λC(θ)\mathcal{L}(\theta) = \frac{1}{n}\sum_{i=1}^n l(y, \theta) + \lambda C(\theta)
with λ0\lambda \ge0 called the regularization parameter and CC is a measure of complexity
If we use the
Log-likelihood function
, a common penalty is to use C(θ)=logp(θ)C(\theta) = -\log p(\theta) where p(θ)p(\theta) is the prior. By setting λ=1n\lambda = \frac{1}{n}, and ignoring the 1n\frac{1}{n} which does not depend on θ\theta.
L(θ)=i=1nlogp(Yiθ)+logp(θ)=(logp(Dθ)+logp(θ))=logp(θD)+logp(D)=logp(θD)L(\theta) = - \sum_{i=1}^n \log p(Y_i | \theta) + \log p(\theta) = - (\log p(D | \theta) + \log p(\theta)) = - \log p(\theta | D) + \log p(D) = \log p(\theta|\mathcal{D})
When we use a log form of
Bayes Theorem
, minimizing this is equivalent to maximizing the log posterior:
MAP
Model Regularization Notion
 
 
 
 
 
 
 

Recommendations