Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

21 March 2025

Abstract

An important question today is whether a given text was used to train a large language model (LLM). A \emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$ -gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$ -gram based membership definition can be effectively gamed. We study scenarios where sequences are \emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$ . Our findings highlight the inadequacy of $n$ -gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

View on arXiv

@article{liu2025_2503.17514,
  title={ Language Models May Verbatim Complete Text They Were Not Explicitly Trained On },
  author={ Ken Ziyu Liu and Christopher A. Choquette-Choo and Matthew Jagielski and Peter Kairouz and Sanmi Koyejo and Percy Liang and Nicolas Papernot },
  journal={arXiv preprint arXiv:2503.17514},
  year={ 2025 }
}

Comments on this paper