Replacing Self-Attention in transformer blocks with a FFT-based token mixing layer that operates without attention. The Fourier Mixing first applies a 1D FFT in the sequence dimension and then applies it to the hidden dimensional direction. This reduces the computational complexity to .
FNet NAACL 2022
aclanthology.org
https://aclanthology.org/2022.naacl-main.319.pdf
Unlocking Gen AI at the Edge: Speeding up Transformers by 80% by Removing Self Attention
A deep dive into FNet, FFT-based mixing, and why the future of AI might belong to fixed-structure models that don’t even try to learn what they can encode.
https://artificialintelligencemadesimple.substack.com/p/speeding-up-transformers-by-80-by


Seonglae Cho