AI Memorization

Understanding Memorization via Loss
Curvature
Loss Function

While it's known that models memorize parts of training data, it was unclear how memorization and general reasoning are stored in structurally different directions (weight directions) within the model. This research uses loss

Curvature to separate the weight directions related to model memorization from those related to general computation/reasoning, and confirms whether reasoning ability is preserved even when only the memorization direction is removed.

Unlike

BalancedSubnet, there's no need to specify which data to erase. In other words, it identifies the structural characteristics of 'where the model stores memorization overall'. Directions used for reasoning are maintained, only memorization directions are removed.

K-FAC for curvature approximation

For single-sample loss, when a sample is memorized → the loss based on that sample is very sharp (high curvature), predictions break with slight weight changes → Sharp minima. However, it uses dataset-wide loss curvature. Low curvature → barely used directions → related only to specific few samples (=memorized samples) while moderate curvature → structures commonly used across many samples → general abilities like reasoning, language understanding, attention, etc.

Pure memorization: performance collapses to 3~16% level. Logical reasoning: almost preserved / slightly improved. Memorization ← (Arithmetic / Factual Recall) — QA — Logical Reasoning → Reasoning. Mathematics: reasoning process remains intact but mistakes occur in calculation → meaning accurate arithmetic relies heavily on memorization-based structures

Understanding Memorization via Loss Curvature

Our new paper proposes a method to identify and suppress memorized content in models. This explainer provides an overview of our work.

https://www.goodfire.ai/research/understanding-memorization-via-loss-curvature

AI Memorization

Understanding Memorization via Loss Curvature Loss Function

K-FAC for curvature approximation

Recommendations

Understanding Memorization via Loss
Curvature
Loss Function