FNet

Replacing

Self-Attention in transformer blocks with a

FFT-based token mixing layer that operates without attention. The Fourier Mixing first applies a 1D FFT in the sequence dimension and then applies it to the hidden dimensional direction. This reduces the computational complexity to .

FNet NAACL 2022

aclanthology.org

https://aclanthology.org/2022.naacl-main.319.pdf

Unlocking Gen AI at the Edge: Speeding up Transformers by 80% by Removing Self Attention

A deep dive into FNet, FFT-based mixing, and why the future of AI might belong to fixed-structure models that don’t even try to learn what they can encode.

https://artificialintelligencemadesimple.substack.com/p/speeding-up-transformers-by-80-by

FNet

FNet NAACL 2022

Recommendations