Unlearning benchmark (including TOFU) has one of the biggest risks, which is "improving scores by simply breaking the model". In other words, if you just make the model brittle so that it can't say anything about the
forget set
, the Forget Accuracy increases, but this is far from truly meaningful selective forgetting. That's why Retain Accuracy is also essential, and a combined score of forget and retain is used. Unlearning Benchmarks