Model Regularization

Creator
Creator
Seonglae Cho
Created
Created
2023 May 9 2:8
Editor
Edited
Edited
2025 Mar 25 12:30
Refs
Refs

We can calculate model complexity by compute norm of parameters

It limits the influence of individual point
Augmented Error is the sum of how badly the model fits and complexity of model (
Bias-Variance Trade-off
). Model regularization mitigates the effect of single data point.
L=Ldata+LregL = L_{data} + L_{reg}
L(θ)=1ni=1nl(y,θ)+λC(θ)\mathcal{L}(\theta) = \frac{1}{n}\sum_{i=1}^n l(y, \theta) + \lambda C(\theta)
with λ0\lambda \ge0 called the regularization parameter and CC is a measure of complexity
If we use the
Log-likelihood function
, a common penalty is to use C(θ)=logp(θ)C(\theta) = -\log p(\theta) where p(θ)p(\theta) is the prior. By setting λ=1n\lambda = \frac{1}{n}, and ignoring the 1n\frac{1}{n} which does not depend on θ\theta.
L(θ)=i=1nlogp(Yiθ)+logp(θ)=(logp(Dθ)+logp(θ))=logp(θD)+logp(D)=logp(θD)L(\theta) = - \sum_{i=1}^n \log p(Y_i | \theta) + \log p(\theta) = - (\log p(D | \theta) + \log p(\theta)) = - \log p(\theta | D) + \log p(D) = \log p(\theta|\mathcal{D})
When we use a log form of
Bayes Theorem
, minimizing this is equivalent to maximizing the log posterior:
MAP
Model Regularization Notion
 
 
 
 
 
 
 

Recommendations