Ridge regression

Creator
Creator
Seonglae Cho
Created
Created
2023 Mar 14 2:28
Editor
Edited
Edited
2025 Mar 24 17:49

Perform regularization as a method to increase bias and reduce variance

Regularization
concept is first adapted to Ridge Regression model
not like linear regression, focus on Regularizer too not only on Fitting error

Linear regression +
L2 Norm

find 0 point in function derivation by
  • controls the tradeoff between Fitting error, Regularizer (overfitting and underfitting)
alpha, lambda, regularization parameter, penalty term is same
Ridge regression shrinks the model’s coefficients but not does not make them zero.

Dual solution (dual representation)

At ridge regression
Setting the derivative of the cost function with respect to to 0 yields the following equation:
Alternatively, we can rewrite the equation in terms of :
Where . is thus a linear combination of the training examples. Dual representation refers to learning by expressing the model parameters as a linear combination of training samples instead of learning them directly (primal representation).
The dual representation with proper regularization enables efficient solution when p>N (
Sample-to-feature ratio
) as the complexity of the problem depends on the number of examples and instead of on the number of input dimensions .
We have two distinct methods for solving the ridge regression optimization:
  • Primal solution (explicit weight vector):
  • Dual solution (linear combination of training examples):
The crucial observation about the dual solution is that the information from the training examples is captured via inner products between pairs of training points in the matrix . Since the computation only involves inner products, we can substitute for all occurrences of inner products by a function that computes:
This way we obtain an algorithm for ridge regression in the feature space defined by the mapping .
 
 
 
 
 

Recommendations