Kernel Function

Creator
Creator
Seonglae Cho
Created
Created
2023 Apr 17 7:39
Editor
Edited
Edited
2025 Mar 24 17:41

Generalized
Distance
or
Vector Similarity

The positive-definite kernel, denoted as K(x,z), calculates the inner product of feature-mapped inputs. The notation <> represents
Vector Similarity
, where is the vector kernel mapping function:
Core Properties:
  1. Symmetry - A valid kernel function must be symmetric
  1. Semi-definiteness - The kernel matrix must be positive semi-definite
Mathematical Conditions:
  • Positive Semi-definiteness: for any
Key Features and Applications:
  • Enables classification of nonlinear data using linear classifiers through higher-dimensional mapping
  • Functions as a
    Feature Map
    that computes inner products between transformed input data
  • Maintains nonlinear characteristics due to the rarity of orthogonal relationships in high-dimensional spaces
  • Generates additional dimensions using existing dimensional data

Theory

Dual solution (dual representation)

At ridge regression
Setting the derivative of the cost function with respect to to 0 yields the following equation:
Alternatively, we can rewrite the equation in terms of :
Where . is thus a linear combination of the training examples. Dual representation refers to learning by expressing the model parameters as a linear combination of training samples instead of learning them directly (primal representation).
The dual representation with proper regularization enables efficient solution when p>N (
Sample-to-feature ratio
) as the complexity of the problem depends on the number of examples and instead of on the number of input dimensions .
We have two distinct methods for solving the ridge regression optimization:
  • Primal solution (explicit weight vector):
  • Dual solution (linear combination of training examples):
The crucial observation about the dual solution is that the information from the training examples is captured via inner products between pairs of training points in the matrix . Since the computation only involves inner products, we can substitute for all occurrences of inner products by a function that computes:
This way we obtain an algorithm for ridge regression in the feature space defined by the mapping .
 
 
 
 
 
 
 

Recommendations