Presents a new method for rigorously distinguishing and measuring information that language models "inadvertently" memorize from data versus information gained through generalization.
- Definition of Memory: Quantifies in bits how much information model θ stores about a specific sample x using Kolmogorov Complexity.
- Through random bit sequence experiments, measured pure "memorization" (inadvertent memory) in situations where generalization is impossible, discovering a storage limit of approximately 3.6 bits per parameter.
- When exceeding model capacity, Deep double descent phenomenon occurs, strengthening Model Generalization instead of memorization. This appears similar to Neural Network Phase Change and can be interpreted the same way as AI Feature Dimensionality. (Grokking)
- Loss-based membership inference performance changes in a sigmoid pattern relative to model size and data size versus memorization ability