Fast Transformers with Clustered AttentionTransformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with...https://arxiv.org/abs/2007.04825