Understanding Memorization via Loss Curvature Loss Function
While it's known that models memorize parts of training data, it was unclear how memorization and general reasoning are stored in structurally different directions (weight directions) within the model. This research uses loss Curvature to separate the weight directions related to model memorization from those related to general computation/reasoning, and confirms whether reasoning ability is preserved even when only the memorization direction is removed.
Unlike BalancedSubnet, there's no need to specify which data to erase. In other words, it identifies the structural characteristics of 'where the model stores memorization overall'. Directions used for reasoning are maintained, only memorization directions are removed.

K-FAC for curvature approximation
For single-sample loss, when a sample is memorized → the loss based on that sample is very sharp (high curvature), predictions break with slight weight changes → Sharp minima. However, it uses dataset-wide loss curvature. Low curvature → barely used directions → related only to specific few samples (=memorized samples) while moderate curvature → structures commonly used across many samples → general abilities like reasoning, language understanding, attention, etc.
Pure memorization: performance collapses to 3~16% level. Logical reasoning: almost preserved / slightly improved. Memorization ← (Arithmetic / Factual Recall) — QA — Logical Reasoning → Reasoning. Mathematics: reasoning process remains intact but mistakes occur in calculation → meaning accurate arithmetic relies heavily on memorization-based structures


Seonglae Cho
