Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10892
Cited By
Proving membership in LLM pretraining data via data watermarks
16 February 2024
Johnny Tian-Zheng Wei
Ryan Yixiang Wang
Robin Jia
WaLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proving membership in LLM pretraining data via data watermarks"
9 / 9 papers shown
Title
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Matthieu Meeus
Lukas Wutschitz
Santiago Zanella Béguelin
Shruti Tople
Reza Shokri
75
0
0
24 Feb 2025
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
55
3
0
24 Oct 2024
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović
Robin Staab
Maximilian Baader
Martin Vechev
110
1
0
04 Oct 2024
Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research
H. Haresamudram
Hrudhai Rajasekhar
Nikhil Murlidhar Shanbhogue
Thomas Ploetz
29
1
0
09 Jun 2024
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
39
3
0
24 May 2024
Data Portraits: Recording Foundation Model Training Data
Marc Marone
Benjamin Van Durme
135
30
0
06 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
306
11,909
0
04 Mar 2022
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
97
93
0
06 Aug 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
1