Query Key Normalization

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 22 5:1
Editor
Edited
Edited
2024 May 22 5:2
Refs
Refs

QK Norm

a normalization technique that modifies the attention mechanism to make the softmax function less prone to arbitrary saturation without sacrificing expressivity
Specifically, apply L2 normalization along the head dimension of each query and key matrix prior to multiplying them and then scale up by a learnable parameter instead of dividing by the square root of the embedding dimension.
 
 
 
 
 

Recommendations