Understanding Memorization via Loss Curvature Loss Function
While it's known that models memorize parts of training data, it was unclear how memorization and general reasoning are stored in structurally different directions (weight directions) within the model. This research uses loss Curvature to separate the weight directions related to model memorization from those related to general computation/reasoning, and confirms whether reasoning ability is preserved even when only the memorization direction is removed.
Unlike BalancedSubnet, there's no need to specify which data to erase. In other words, it identifies the structural characteristics of 'where the model stores memorization overall'. Directions used for reasoning are maintained, only memorization directions are removed.

K-FAC for curvature approximation
For single-sample loss, when a sample is memorized → the loss based on that sample is very sharp (high curvature), predictions break with slight weight changes → Sharp minima. However, it uses dataset-wide loss curvature. Low curvature → barely used directions → related only to specific few samples (=memorized samples) while moderate curvature → structures commonly used across many samples → general abilities like reasoning, language understanding, attention, etc.
Pure memorization: performance collapses to 3~16% level. Logical reasoning: almost preserved / slightly improved. Memorization ← (Arithmetic / Factual Recall) — QA — Logical Reasoning → Reasoning. Mathematics: reasoning process remains intact but mistakes occur in calculation → meaning accurate arithmetic relies heavily on memorization-based structures

Understanding Memorization via Loss Curvature
Our new paper proposes a method to identify and suppress memorized content in models. This explainer provides an overview of our work.
https://www.goodfire.ai/research/understanding-memorization-via-loss-curvature

HUBBLE hubbleallegro-lab • Updated 2026 May 6 10:33
hubble
allegro-lab • Updated 2026 May 6 10:33
HUBBLE started from the motivation to precisely control the frequency and timing of data insertion at the 1B and 8B parameter scales, in order to identify the causes and mechanisms of memorization.
HUBBLE is based on the Llama architecture, and consists of a “Standard” model trained on a typical English corpus and a “Perturbed” model in which specific sensitive data is strategically inserted. To measure the degree of memorization, it uses the normalized log-likelihood (Normalized Log-Likelihood), $NL(x)$, as a main metric, and evaluates the model $theta$'s predictive probability for a given text as follows.
Here, denotes the length of the sequence and denotes the $i$-th token; the higher this value is, the more strongly the model is memorizing the corresponding data.
From an experimental-design perspective, HUBBLE randomly duplicated texts from various domains (Gutenberg book passages, Wikipedia, YAGO biographical data, MMLU, etc.) from 1 to 256 insertions. It also includes “Timing run” models that control which stage of training (Early, Middle, Late) the data is exposed, enabling tracking of memorization and forgetting dynamics over training time.
allegrolab (Allegro Lab @ USC)
Org profile for Allegro Lab @ USC on Hugging Face, the AI community building the future.
https://huggingface.co/allegrolab
Hubble: a Model Suite to Advance the Study of LLM Memorization
We present Hubble, a suite of fully open-source large language models (LLMs) for the scientific study of LLM memorization. Hubble models come in standard and perturbed variants: standard models...
https://arxiv.org/abs/2510.19811


Seonglae Cho