Engram

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 13 19:0
Editor
Edited
Edited
2026 Jan 13 19:7

Conditional Memory via Scalable Lookup

A New Axis of Sparsity for Large Language Models
Engram
deepseek-aiUpdated 2026 Jan 13 19:7
notion image
To address the inefficiency of knowledge lookup that MoE (conditional computation) alone cannot solve, we propose a new axis of sparsity: Conditional Memory. Existing Transformers lack a lookup primitive, requiring them to reconstruct static knowledge through multiple layers of computation inefficiently. Engram performs direct static knowledge retrieval via N-gram-based O(1) hash lookup, then adjusts it to the current context using context-aware gating.
notion image
Results show a Sparsity Allocation Law: there exists a U-shaped optimal allocation point between MoE and Engram parameters → performance peaks when allocating approximately 20-25% to Engram rather than pure MoE. Early layers skip static pattern processing, allowing more depth for reasoning. Long-context performance significantly improved (RULER, LongPPL). Deterministic lookup enables CPU/Host memory offloading, with 100B memory incurring <3% overhead.
notion image
Memory Network
Unlike Memory Network, it has no computation overhead and is based on O(1) deterministic hash lookup. Memory can be scaled massively and easily integrated with modern transformers and MoE.
 
 
 
 
www.arxiv.org
 
 

Recommendations