Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06443
Cited By
LLM Dataset Inference: Did you train on my dataset?
10 June 2024
Pratyush Maini
Hengrui Jia
Nicolas Papernot
Adam Dziedzic
MIALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLM Dataset Inference: Did you train on my dataset?"
13 / 13 papers shown
Title
Revisiting Data Auditing in Large Vision-Language Models
Hongyu Zhu
Sichu Liang
W. Wang
Boheng Li
Tongxin Yuan
Fangqi Li
Shilin Wang
Zhuosheng Zhang
VLM
125
0
0
25 Apr 2025
Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection
Ali Naseh
Niloofar Mireshghallah
51
0
0
20 Jan 2025
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
90
3
2
10 Oct 2024
Fine-tuning can Help Detect Pretraining Data from Large Language Models
H. Zhang
Songxin Zhang
Bingyi Jing
Hongxin Wei
34
0
0
09 Oct 2024
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović
Robin Staab
Maximilian Baader
Martin Vechev
77
1
0
04 Oct 2024
Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Jie Zhang
Debeshee Das
Gautam Kamath
Florian Tramèr
MIALM
MIACV
223
16
1
29 Sep 2024
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Qizhang Feng
Siva Rajesh Kasa
Santhosh Kumar Kasa
Hyokun Yun
C. Teo
S. Bodapati
84
6
0
08 Jul 2024
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
169
86
0
06 Dec 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
Dataset Inference: Ownership Resolution in Machine Learning
Pratyush Maini
Mohammad Yaghini
Nicolas Papernot
FedML
61
103
0
21 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,986
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,808
0
14 Dec 2020
Combining p-values via averaging
V. Vovk
Ruodu Wang
FedML
57
205
0
20 Dec 2012
1