v1v2v3 (latest)

Detecting Training Data of Large Language Models via Expectation Maximization

10 October 2024

Gyuwan Kim

Yang Li

Evangelia Spiliopoulou

Jie Ma

Miguel Ballesteros

MIALM

ArXiv (abs)PDF HTML Github (7★)

Main:9 Pages

4 Figures

Bibliography:4 Pages

5 Tables

Appendix:2 Pages

Abstract

Membership inference attacks (MIAs) aim to determine whether a specific example was used to train a given language model. While prior work has explored prompt-based attacks such as ReCALL, these methods rely heavily on the assumption that using known non-members as prompts reliably suppresses the model's responses to non-member queries. We propose EM-MIA, a new membership inference approach that iteratively refines prefix effectiveness and membership scores using an expectation-maximization strategy without requiring labeled non-member examples. To support controlled evaluation, we introduce OLMoMIA, a benchmark that enables analysis of MIA robustness under systematically varied distributional overlap and difficulty. Experiments on WikiMIA and OLMoMIA show that EM-MIA outperforms existing baselines, particularly in settings with clear distributional separability. We highlight scenarios where EM-MIA succeeds in practical settings with partial distributional overlap, while failure cases expose fundamental limitations of current MIA methods under near-identical conditions. We release our code and evaluation pipeline to encourage reproducible and robust MIA research.

View on arXiv

Comments on this paper