Gated retention (gRet, aka gRetNet or RetNet-3) augments retention with a data-dependent gating mechanism, which achieves training parallelism, good performance, and low inference cost simultaneously for sequence modeling.
Gated Retention
Creator
Creator
Seonglae ChoCreated
Created
2024 May 18 7:7Editor
Editor
Seonglae ChoEdited
Edited
2024 May 18 7:8Refs
Refs