Multi-Modal Large Language Models (MLLMs) have exhibited remarkable performance on various vision-language tasks such as Visual Question Answering (VQA). Despite accumulating evidence of privacy concerns associated with task-relevant content, it remains unclear whether MLLMs inadvertently memorize private content that is entirely irrelevant to the training tasks. In this paper, we investigate how randomly generated task-irrelevant private content can become spuriously correlated with downstream objectives due to partial mini-batch training dynamics, thus causing inadvertent memorization. Concretely, we randomly generate task-irrelevant watermarks into VQA fine-tuning images at varying probabilities and propose a novel probing framework to determine whether MLLMs have inadvertently encoded such content. Our experiments reveal that MLLMs exhibit notably different training behaviors in partial mini-batch settings with task-irrelevant watermarks embedded. Furthermore, through layer-wise probing, we demonstrate that MLLMs trigger distinct representational patterns when encountering previously seen task-irrelevant knowledge, even if this knowledge does not influence their output during prompting. Our code is available atthis https URL.
View on arXiv@article{ju2025_2503.01208, title={ Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models }, author={ Tianjie Ju and Yi Hua and Hao Fei and Zhenyu Shao and Yubin Zheng and Haodong Zhao and Mong-Li Lee and Wynne Hsu and Zhuosheng Zhang and Gongshen Liu }, journal={arXiv preprint arXiv:2503.01208}, year={ 2025 } }