Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/Sparse Attention/
Star Attention
Search

Star Attention

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Nov 29 21:35
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Nov 29 21:36
Refs
Refs

1. Context Encoding

blockwise-local attention across distributed hosts

2. Query Encoding and Token Generation

query and response tokens use sequence-global attention to access cached tokens
notion image
 
 
 
 
 
Star Attention: Efficient LLM Inference over Long Sequences
Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star...
Star Attention: Efficient LLM Inference over Long Sequences
https://arxiv.org/abs/2411.17116
Star Attention: Efficient LLM Inference over Long Sequences
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/Sparse Attention/
Star Attention
Copyright Seonglae Cho