RNN이면서도 Transformer처럼 학습 시에는 여러 토큰에 대해 동시에 연산을 수행할 수 있도록 Reformer: The Efficient TransformerLarge Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two...https://arxiv.org/abs/2001.04451