AI Memorization

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Nov 13 0:40
Editor
Edited
Edited
2026 Jun 12 20:36
Refs
Refs
 
 

Understanding Memorization via Loss
Curvature
Loss Function

While it's known that models memorize parts of training data, it was unclear how memorization and general reasoning are stored in structurally different directions (weight directions) within the model. This research uses loss
Curvature
to separate the weight directions related to model memorization from those related to general computation/reasoning, and confirms whether reasoning ability is preserved even when only the memorization direction is removed.
Unlike
BalancedSubnet
, there's no need to specify which data to erase. In other words, it identifies the structural characteristics of 'where the model stores memorization overall'. Directions used for reasoning are maintained, only memorization directions are removed.
notion image

K-FAC
for curvature approximation

For single-sample loss, when a sample is memorized → the loss based on that sample is very sharp (high curvature), predictions break with slight weight changes → Sharp minima. However, it uses dataset-wide loss curvature. Low curvature → barely used directions → related only to specific few samples (=memorized samples) while moderate curvature → structures commonly used across many samples → general abilities like reasoning, language understanding, attention, etc.
Pure memorization: performance collapses to 3~16% level. Logical reasoning: almost preserved / slightly improved. Memorization ← (Arithmetic / Factual Recall) — QA — Logical Reasoning → Reasoning. Mathematics: reasoning process remains intact but mistakes occur in calculation → meaning accurate arithmetic relies heavily on memorization-based structures
notion image
Understanding Memorization via Loss Curvature
Our new paper proposes a method to identify and suppress memorized content in models. This explainer provides an overview of our work.
Understanding Memorization via Loss Curvature

HUBBLE
hubble
allegro-labUpdated 2026 May 6 10:33

HUBBLE started from the motivation to precisely control the frequency and timing of data insertion at the 1B and 8B parameter scales, in order to identify the causes and mechanisms of memorization.
HUBBLE is based on the Llama architecture, and consists of a “Standard” model trained on a typical English corpus and a “Perturbed” model in which specific sensitive data is strategically inserted. To measure the degree of memorization, it uses the normalized log-likelihood (Normalized Log-Likelihood), $NL(x)$, as a main metric, and evaluates the model $theta$'s predictive probability for a given text as follows.
Here, denotes the length of the sequence and denotes the $i$-th token; the higher this value is, the more strongly the model is memorizing the corresponding data.
From an experimental-design perspective, HUBBLE randomly duplicated texts from various domains (Gutenberg book passages, Wikipedia, YAGO biographical data, MMLU, etc.) from 1 to 256 insertions. It also includes “Timing run” models that control which stage of training (Early, Middle, Late) the data is exposed, enabling tracking of memorization and forgetting dynamics over training time.
allegrolab (Allegro Lab @ USC)
Org profile for Allegro Lab @ USC on Hugging Face, the AI community building the future.
allegrolab (Allegro Lab @ USC)
Hubble: a Model Suite to Advance the Study of LLM Memorization
We present Hubble, a suite of fully open-source large language models (LLMs) for the scientific study of LLM memorization. Hubble models come in standard and perturbed variants: standard models...
Hubble: a Model Suite to Advance the Study of LLM Memorization
 
 
 

Recommendations