Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

In this paper, we explore machine unlearning from a novel dimension, by studying how to safeguard model unlearning in large language models (LLMs). Our goal is to prevent unlearned models from recalling any related memory of the targetedthis http URLbegin by uncovering a surprisingly simple yet overlooked fact: existing methods typically erase only the exact expressions of the targeted knowledge, leaving paraphrased or related information intact. To rigorously measure such oversights, we introduce UGBench, the first benchmark tailored for evaluating the generalisation performance across 13 state-of-the-artthis http URLreveals that unlearned models can still recall paraphrased answers and retain target facts in intermediate layers. To address this, we propose PERMU, a perturbation-based method that significantly enhances the generalisation capabilities for safeguarding LLMthis http URLdemonstrate that PERMU delivers up to a 50.13% improvement in unlearning while maintaining a 43.53% boost in robust generalisation. Our code can be found inthis https URL.
View on arXiv@article{wang2025_2502.19982, title={ Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models }, author={ Huazheng Wang and Yongcheng Jing and Haifeng Sun and Yingjie Wang and Jingyu Wang and Jianxin Liao and Dacheng Tao }, journal={arXiv preprint arXiv:2502.19982}, year={ 2025 } }