Kolmogorov-Arnold Networks
Effectively prevent Catastrophic forgetting and alternative of replacing MLP of transformer. Kolmogorov-Arnold Networks are smaller, more interpretable with complex activation functions demand more computational resources. It turns out, that you can write Kolmogorov-Arnold Network as an MLP, with some repeats and shift before ReLU.
The performance issue of the original implementation comes from expanding all intermediate variables to perform the different activation functions. This repository implements a solution to this efficiency gap by formulating all activation functions as a linear combination of a fixed set of basis functions.
Wav-KAN Wavelet
MLP’s downstream task performance is better while KAN good at symbolic formula representing
Kolmogorov–Arnold Transformer which replaces MLPs with Group-Rational KAN layers