Titans Memory

Learning to Memorize at Test Time

They focus on Long term

Associative Memory in which we aim to store the past data as the pairs of keys and values. Similar to Transformers, they use two linear layers to project into a key and value.

Next, they expect our memory module to learn the associations between keys and values. They define the loss as MSE between values and constructed values from the keys.

Accordingly, in the inner loop, we optimize ’s weights, while in the outer-loop, we optimize other parameters () of the entire architecture.

Adaptive Forgetting

When dealing with very large sequences, it is crucial to manage which past information should be forgotten. Adaptive Forgetting mechanism that allows the memory to forget the information that is not needed anymore, resulting in better managing the memory’s limited capacity.

where is the gating mechanism that flexibly controls the memory.