Attention Mechanism Optimizations
Sparse Attention
Monarch Mixer
Flash Attention
Dilated Attention
PagedAttention
Group Query Attentiion
Multi Query Attention
Clustered attention
Layer Selective Rank Reduction
KV Cache
FAVOR+
Paged Attention
Chunk Attention
Memory-efficient Attention
Gated Attention
FlexAttention
Selective Attention
Fire Attention
FNet
Multi-head Attention Optimization
Sigmoid Attention, replacing the traditional softmax with a sigmoid and a constant bias