Scaled Attention

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Feb 28 3:55
Editor
Edited
Edited
2024 Feb 28 4:0
Refs
Refs

Control the variance to avoid softmax output would be like one hot vector

important normalization to have
Because k, q variance is much smaller than value
dividing by squared dimension size
 
 
 

Difference

notion image
notion image
 
 
 
 
 
 

Recommendations