Automatic Calibration for Membership Inference Attack on Large Language Models

Membership Inference Attacks (MIAs) have recently been employed to determine whether a specific text was part of the pre-training data of Large Language Models (LLMs). However, existing methods often misinfer non-members as members, leading to a high false positive rate, or depend on additional reference models for probability calibration, which limits their practicality. To overcome these challenges, we introduce a novel framework called Automatic Calibration Membership Inference Attack (ACMIA), which utilizes a tunable temperature to calibrate output probabilities effectively. This approach is inspired by our theoretical insights into maximum likelihood estimation during the pre-training of LLMs. We introduce ACMIA in three configurations designed to accommodate different levels of model access and increase the probability gap between members and non-members, improving the reliability and robustness of membership inference. Extensive experiments on various open-source LLMs demonstrate that our proposed attack is highly effective, robust, and generalizable, surpassing state-of-the-art baselines across three widely used benchmarks. Our code is available at: \href{this https URL}{\textcolor{blue}{Github}}.
View on arXiv@article{zade2025_2505.03392, title={ Automatic Calibration for Membership Inference Attack on Large Language Models }, author={ Saleh Zare Zade and Yao Qiang and Xiangyu Zhou and Hui Zhu and Mohammad Amin Roshani and Prashant Khanduri and Dongxiao Zhu }, journal={arXiv preprint arXiv:2505.03392}, year={ 2025 } }