Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/
Sparse Attention
Search

Sparse Attention

Creator
Creator
Seonglae Cho
Created
Created
2023 Oct 6 7:34
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Nov 27 14:55
Refs
Refs
LM Context Extending
Window attention, where only the most recent KVs are cached, is a natural approach
notion image
Sparse Attentions
Native Sparse Attention
Sliding window attention
BigBird attention
LSG attention
Dynamic Sparse Attention
Star Attention
Hash Attention
 
 
 
 
 
Hugging Face Reads, Feb. 2021 - Long-range Transformers
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Hugging Face Reads, Feb. 2021 - Long-range Transformers
https://huggingface.co/blog/long-range-transformers
Hugging Face Reads, Feb. 2021 - Long-range Transformers
LLMs May Not Need Dense Self Attention
Sink Tokens and the Sparsity of Attention Scores in Transformer Models
LLMs May Not Need Dense Self Attention
https://medium.com/@buildingblocks/llms-may-not-need-dense-self-attention-1fa3bf47522e
LLMs May Not Need Dense Self Attention
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/
Sparse Attention
Copyright Seonglae Cho