Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/
Dilated Attention
Search

Dilated Attention

Creator
Creator
Seonglae Cho
Created
Created
2023 Jul 13 9:4
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Mar 2 3:35
Refs
Refs
Dilated convolution
notion image
notion image
 
 
 
 
Microsoft’s LongNet Scales Transformer to One Billion Tokens
Scaling sequence length is of paramount importance for large language models, as it brings about singnificant benefits. These advantages…
Microsoft’s LongNet Scales Transformer to One Billion Tokens
https://medium.com/syncedreview/microsofts-longnet-scales-transformer-to-one-billion-tokens-af02ff657d87
Microsoft’s LongNet Scales Transformer to One Billion Tokens
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Attention Mechanism Optimization/
Dilated Attention
Copyright Seonglae Cho