Attention Matrix QKScaled dot product attention is most usual which are in Transformer architecture paperKey, Query Vector Similarity Attention Key Attention QueryAttention=Σ<K,Q>Attention = \Sigma <K, Q>Attention=Σ<K,Q>Attention Score functionsDot-Product AttentionBahdanau AttentionMultiplicative AttentionAdditive Attention