Sparse Attention

Creator

Creator

Seonglae Cho

Created

Created

2023 Oct 6 7:34

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Nov 27 14:55

Refs

Refs

LM Context Extending

Window attention, where only the most recent KVs are cached, is a natural approach

notion image

Sparse Attentions

Native Sparse Attention

Sliding window attention

BigBird attention

Dynamic Sparse Attention

Hugging Face Reads, Feb. 2021 - Long-range Transformers

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Hugging Face Reads, Feb. 2021 - Long-range Transformers

https://huggingface.co/blog/long-range-transformers

Hugging Face Reads, Feb. 2021 - Long-range Transformers

LLMs May Not Need Dense Self Attention

Sink Tokens and the Sparsity of Attention Scores in Transformer Models

LLMs May Not Need Dense Self Attention

https://medium.com/@buildingblocks/llms-may-not-need-dense-self-attention-1fa3bf47522e

LLMs May Not Need Dense Self Attention

Backlinks

Language Model Language Model Context Reversing Transformer

Recommendations

//////////