Higher-Order Persistent Engine
- Deep Optimizers – Reinterpret Adam and similar optimizers as "deep memory modules" and propose more expressive optimizers (e.g., Deep Momentum GD)
- Transforms momentum from a simple vector into a small MLP or nonlinear module, enabling more complex learning and memory of past gradient patterns
- Self-Modifying Titans – Sequence models that learn their own update rules and transform themselves
- Does this mean outer loop training of a network that projects gradients?
- Continuum Memory System (CMS) – Generalizes short-term and long-term memory as a continuous spectrum, mimicking the brain's multi-timescale memory
- Applies different update frequencies to MLP blocks: fast layers act like short-term memory, slow layers like long-term memory, enabling continual learning across time scales
Achieves lower perplexity and higher accuracy than existing models like Transformer, RetNet, and Titans on language modeling and common sense reasoning benchmarks (PIQA, ARC, BoolQ, etc.).

Seonglae Cho