Global attentionnaive elementwise multiplied soft attention for each hidden layer output Local AttentionLocal window hard attention + global soft attention 2015 10000 citationsEffective Approaches to Attention-based Neural Machine Translationarxiv.orghttps://arxiv.org/pdf/1508.04025.pdf