Neural Tangent Kernel

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jul 8 9:53
Editor
Edited
Edited
2024 Jul 8 14:42
Refs
Refs

NTK theory

Neural network over-parameterizes but good at generalization which is similar to the kernel method.
Neural Network
to linear model by
Taylor Series
expansion approximation which is similar to the
Back Propagation
. Back propagation’s gradient flow can always find solution to make loss as zero since NTL can convert ANN with infinite nodes (parameters) and epsilon learning rate. It is called "tangent" because the kernel maps how the gradient of the output (y) changes linearly (i.e., in a tangent-like manner) with respect to small changes in the network's weights.
Universal Approximation Theorem
explains the results of ability while NTL focuses on gradient flow and training dynamics.
As a result, linearity means that we can write a closed form solution for the training dynamics, and this closed form solution depends critically on the neural tangent kernel. Each element of the neural tangent kernel consists of the inner product of the vectors of derivatives for a pair of training data examples. This can be calculated for any network and we call this the empirical NTK. If we let the width become infinite, then we can get closed-form solutions, which are referred to as analytical NTKs.

Fourier featuring

Analyzing gradient descent using this, we can understand the convergence properties of neural networks. By decomposing the dynamics of the output using the chain-rule under the least-square loss assumption, we can obtain an ODE composed of NTK. In the over-parameterized regime, the kernel is positive-definite, allowing for eigenvalue decomposition. At this point, the eigenvalue is the convergence rate of the neural network.
It can be seen that Fourier features help NN learn high-frequency information better. Generally, natural data have large magnitudes of low-frequency information and small magnitudes of high-frequency information. Analyzing the NTK of inputs that have passed through Fourier features shows that the eigenvalue falloff is not greater than that of the traditional MLP NTK. This indicates that NN can sufficiently learn up to high frequencies.
 
 

Paper

Mathematics

Korean

 
 

Recommendations