Instance-Level Data-Use Auditing of Visual ML Models

The growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the urgent need for reliable data-use auditing mechanisms to ensure accountability and transparency in ML. In this paper, we present the first proactive instance-level data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML models, providing more fine-grained auditing results. Our approach integrates any black-box membership inference technique with a sequential hypothesis test, providing a quantifiable and tunable false-detection rate. We evaluate our method on three types of visual ML models: image classifiers, visual encoders, and Contrastive Image-Language Pretraining (CLIP) models. In additional, we apply our method to evaluate the performance of two state-of-the-art approximate unlearning methods. Our findings reveal that neither method successfully removes the influence of the unlearned data instances from image classifiers and CLIP models even if sacrificing model utility by .
View on arXiv@article{huang2025_2503.22413, title={ Instance-Level Data-Use Auditing of Visual ML Models }, author={ Zonghao Huang and Neil Zhenqiang Gong and Michael K. Reiter }, journal={arXiv preprint arXiv:2503.22413}, year={ 2025 } }