Fast Transformers with Clustered Attention
Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with...
https://arxiv.org/abs/2007.04825