Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Reversing Transformer/Attention Sink/
Attention Canceling
Search

Attention Canceling

Created
Created
2024 Oct 26 11:34
Creator
Creator
Seonglae Cho
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Nov 3 0:1
Refs
Refs
Using different weight and extracting common attention score to canceling irrelevant context
 
 
 
 
 
 
Differential Transformer
Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise....
Differential Transformer
https://arxiv.org/abs/2410.05258
Differential Transformer
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/Neural Network/Neural Network Structure/Seq2Seq/Attention Mechanism/Reversing Transformer/Attention Sink/
Attention Canceling
Copyright Seonglae Cho