Kernel Function

Generalized
Distance or
Vector Similarity

The positive-definite kernel, denoted as

K(x,z)

, calculates the inner product of feature-mapped inputs. The notation <> represents

Vector Similarity, where

\phi(x)

is the vector kernel mapping function:

K_{nm} = K_{mn} = K(x_n, x_m) = <\phi(x_n), \phi(x_m)>

Core Properties:

Symmetry - A valid kernel function must be symmetric

Semi-definiteness - The kernel matrix must be positive semi-definite

Mathematical Conditions:

Symmetry: $k(x,y) = k(y,x)$ (
Commutative Property)

Positive Semi-definiteness: $\sum\sum c_i c_j k(x_i,x_j) \ge 0$ for any $c, x$

K(x,z) = <ϕ(x), ϕ(z)> = ϕ(x) \cdot ϕ(z)

Key Features and Applications:

Enables classification of nonlinear data using linear classifiers through higher-dimensional mapping

Functions as a
Feature Map that computes inner products between transformed input data

Maintains nonlinear characteristics due to the rarity of orthogonal relationships in high-dimensional spaces

Generates additional dimensions using existing dimensional data

Serves as similarity metrics, particularly in the case of
Gaussian Kernel

Theory

Dual solution (dual representation)

At ridge regression

\alpha^* = \underset{\alpha}{\arg\min} \; \frac{1}{n^2} \sum_{i,j=1}^n (\alpha_i - y_i)(\alpha_j - y_j) k(\mathbf{x}_i, \mathbf{x}_j) + \lambda \|\boldsymbol{\alpha}\|^2

Setting the derivative of the cost function with respect to

\mathbf{w}

to 0 yields the following equation:

\mathbf{X}^\top (\mathbf{X}\mathbf{w} - \mathbf{y}) + \lambda \mathbf{w} = 0 \\ (\mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I}) \mathbf{w} = \mathbf{X}^\top \mathbf{y} \\ \mathbf{w} = (\mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^\top \mathbf{y}

Alternatively, we can rewrite the equation in terms of

\alpha

\mathbf{X}^\top (\mathbf{X} \mathbf{w} - \mathbf{y}) + \lambda \mathbf{w} = 0 \\ \mathbf{w} = \sum_i \alpha_i \mathbf{x}_i = \mathbf{X}^\top \boldsymbol{\alpha} \\ \boldsymbol{\alpha} = (\mathbf{K} + \lambda \mathbf{I})^{-1} \mathbf{y}

Where

\mathbf{K}_{ij} = \mathbf{x}_i^\top \mathbf{x}_j

w

is thus a linear combination of the training examples. Dual representation refers to learning by expressing the model parameters

w

as a linear combination of training samples instead of learning them directly (primal representation).

The dual representation with proper regularization enables efficient solution when p>N (

Sample-to-feature ratio) as the complexity of the problem depends on the number of examples

N

and instead of on the number of input dimensions

p

We have two distinct methods for solving the ridge regression optimization:

Primal solution (explicit weight vector):

\mathbf{w} = (\mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^\top \mathbf{y}

Dual solution (linear combination of training examples):

\boldsymbol{\alpha} = (\mathbf{K} + \lambda \mathbf{I})^{-1} \mathbf{y}, \quad \mathbf{w} = \mathbf{X}^\top \boldsymbol{\alpha}

The crucial observation about the dual solution is that the information from the training examples is captured via inner products between pairs of training points in the matrix

\mathbf{K} = \mathbf{X} \mathbf{X}^\top

. Since the computation only involves inner products, we can substitute for all occurrences of inner products by a function

\mathbf{K}

that computes:

K_{nm} = K_{mn} = K(x_n, x_m) = <\phi(x_n), \phi(x_m)>

This way we obtain an algorithm for ridge regression in the feature space

F

defined by the mapping

\phi : x \rightarrow \phi(x) \in F

Positive-definite kernel

In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

https://en.wikipedia.org/wiki/Positive-definite_kernel

Kernel Function

Generalized
Distance or
Vector Similarity

Theory

Dual solution (dual representation)

Backlinks

Recommendations

Kernel Function

Generalized Distance or Vector Similarity

Theory

Dual solution (dual representation)

Backlinks

Recommendations

Generalized
Distance or
Vector Similarity