Replacing Self-Attention in transformer blocks with a FFT-based token mixing layer that operates without attention. The Fourier Mixing first applies a 1D FFT in the sequence dimension and then applies it to the hidden dimensional direction. This reduces the computational complexity to .
FNet
Creator
Creator
Seonglae ChoCreated
Created
2025 Apr 27 18:17Editor
Editor
Seonglae ChoEdited
Edited
2025 Apr 27 18:20Refs
Refs