Query Key Normalization

Creator

Seonglae Cho

Created

2024 May 22 5:1

Editor

Seonglae Cho

Edited

2024 May 22 5:2

Refs

QK Norm

a normalization technique that modifies the attention mechanism to make the softmax function less prone to arbitrary saturation without sacrificing expressivity

Specifically, apply L2 normalization along the head dimension of each query and key matrix prior to multiplying them and then scale up by a learnable parameter instead of dividing by the square root of the embedding dimension.

arxiv.org

https://arxiv.org/pdf/2010.04245

Recommendations

////////