MoD

Created
Created
2024 Apr 5 6:29
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Oct 30 20:51

Mixture of Depth

Leaves option to just skip the layer
notion image
This new method dynamically allocates computation in transformer models, optimizing resource use while ensuring accuracy. It processes complex tokens selectively and skips simpler ones, cutting computational overhead significantly.
MoD checks each token's complexity within a sequence, applying computation selectively to those needing deeper processing. This strategy moves away from the traditional approach of uniformly allocating computation across all tokens.
Unlike
PAUSE Token
virtually implement an additional attention layers, MoD remove unnecessary attention layer’s computation if context vector is enough to predict next token precisely.
 
 
 
 
 
 

Recommendations