Global attention
naive elementwise multiplied soft attention for each hidden layer output
Local Attention
Local window hard attention + global soft attention
2015 10000 citations
Effective Approaches to Attention-based Neural Machine Translation
Seonglae Cho
Seonglae Cho