Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/
Sparse Attention
Search

Sparse Attention

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Oct 6 7:34
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Nov 27 14:55
Refs
Refs
LM Context Extending
Window attention, where only the most recent KVs are cached, is a natural approach
notion image
Sparse Attentions
Native Sparse Attention
DeepSeek Sparse Attention
Sliding window attention
BigBird attention
LSG attention
Dynamic Sparse Attention
Star Attention
Hash Attention
Ball Attention
 
 
 
 
 
Hugging Face Reads, Feb. 2021 - Long-range Transformers
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Hugging Face Reads, Feb. 2021 - Long-range Transformers
https://huggingface.co/blog/long-range-transformers
Hugging Face Reads, Feb. 2021 - Long-range Transformers
LLMs May Not Need Dense Self Attention
Sink Tokens and the Sparsity of Attention Scores in Transformer Models
LLMs May Not Need Dense Self Attention
https://medium.com/@buildingblocks/llms-may-not-need-dense-self-attention-1fa3bf47522e
LLMs May Not Need Dense Self Attention
 
 

Backlinks

Language ModelLanguage Model ContextReversing Transformer

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/
Sparse Attention
Copyright Seonglae Cho