SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Main:12 Pages
8 Figures
Bibliography:4 Pages
5 Tables
Appendix:1 Pages
Abstract
Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-Augmented Generation (MRAG) significantly improve the knowledge coverage and contextual understanding of Large Language Models (LLMs) by introducing external knowledge sources. However, retrieval and multimodal fusion obscure content provenance, rendering existing membership inference methods unable to reliably attribute generated outputs to pre-training, external retrieval, or user input, thus undermining privacy leakage accountability
View on arXivComments on this paper