34
1

Continual Memorization of Factoids in Language Models

Abstract

As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tuning for memorization may be ineffective in storing knowledge or may exacerbate hallucinations. In this work, we introduce a setting we call continual memorization, where a model must memorize and retain a set of factoids through multiple stages of fine-tuning on subsequent datasets. We characterized the forgetting patterns through extensive experiments and show that LMs widely suffer from forgetting, especially when needing to memorize factoids in the second stage. We posit that forgetting can be alleviated by modifying training dynamics: (1) protecting the memorization process when learning factoids or (2) reducing interference from subsequent training stages. Intriguingly, we find that mixing randomly generated word sequences or generic data sampled from pretraining corpora at different training stages effectively mitigates forgetting REMIX: Random and Generic Data Mixing). REMIX can recover performance from severe forgetting, outperforming replay methods and other continual learning baselines. We analyze how REMIX influences the learning process and find that robust memorization follows a distinct pattern: the model stores factoids in earlier layers than usual and diversifies the layers that retain them, which results in easier recall and manipulate of the learned factoids.

View on arXiv
@article{chen2025_2411.07175,
  title={ Continual Memorization of Factoids in Language Models },
  author={ Howard Chen and Jiayi Geng and Adithya Bhaskar and Dan Friedman and Danqi Chen },
  journal={arXiv preprint arXiv:2411.07175},
  year={ 2025 }
}
Comments on this paper