MathNeuro

Using both math (GSM8K/MATH) and non-math (RACE/MMLU) inputs to find respective Top-K important parameter sets, then filtering math-specific parameters = math Top-K \ non-math Top-K.

When deleting math-specific parameters, math performance drops dramatically, while non-math performance shows only slight decreases comparable to random pruning (preserving general language capabilities). Even a single sample can achieve meaningful separation (multiple samples provide more stability). Different sample sets show parameter identification overlap of ≥~95% (with 100+ samples). Math-specific parameters are evenly distributed across decoder blocks (not limited to specific layers), suggesting the existence of cross-layer circuits.

aclanthology.org

https://aclanthology.org/2025.acl-long.1209.pdf

MathNeuro

Recommendations