AI Memory Capacity

Creator
Creator
Seonglae Cho
Created
Created
2025 Jun 9 0:48
Editor
Edited
Edited
2025 Jun 9 1:17
Refs
Refs
Presents a new method for rigorously distinguishing and measuring information that language models "inadvertently" memorize from data versus information gained through generalization.
  • Definition of Memory: Quantifies in bits how much information model θ stores about a specific sample x using
    Kolmogorov Complexity
    .
  • Through random bit sequence experiments, measured pure "memorization" (inadvertent memory) in situations where generalization is impossible, discovering a storage limit of approximately 3.6 bits per parameter.
  • Loss-based membership inference performance changes in a sigmoid pattern relative to model size and data size versus memorization ability
 
 
 
 
 
 
 

Recommendations