MoD

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 5 6:29
Editor
Edited
Edited
2024 Apr 6 13:24

Mixture of Depth

Leaves option to just skip the layer
notion image
This new method dynamically allocates computation in transformer models, optimizing resource use while ensuring accuracy. It processes complex tokens selectively and skips simpler ones, cutting computational overhead significantly.
MoD checks each token's complexity within a sequence, applying computation selectively to those needing deeper processing. This strategy moves away from the traditional approach of uniformly allocating computation across all tokens.
Unlike
PAUSE Token
virtually implement an additional attention layers, MoD remove unnecessary attention layer’s computation if context vector is enough to predict next token precisely.
 
 
 
 
 
 

Recommendations