Dilated Attention

Creator

Creator

Seonglae Cho

Created

Created

2023 Jul 13 9:4

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Mar 2 3:35

Refs

Refs

Dilated convolution

notion image

notion image

Microsoft’s LongNet Scales Transformer to One Billion Tokens

Scaling sequence length is of paramount importance for large language models, as it brings about singnificant benefits. These advantages…

Microsoft’s LongNet Scales Transformer to One Billion Tokens

https://medium.com/syncedreview/microsofts-longnet-scales-transformer-to-one-billion-tokens-af02ff657d87

Microsoft’s LongNet Scales Transformer to One Billion Tokens

Backlinks

Recommendations

//////////