Sparse AttentionsSliding window attentionBigBird attentionLSG attentionDynamic Sparse Attention LLMs May Not Need Dense Self AttentionSink Tokens and the Sparsity of Attention Scores in Transformer Modelshttps://medium.com/@buildingblocks/llms-may-not-need-dense-self-attention-1fa3bf47522e