Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.06539
Cited By
Deduplicating Training Data Mitigates Privacy Risks in Language Models
14 February 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deduplicating Training Data Mitigates Privacy Risks in Language Models"
12 / 62 papers shown
Title
Provably Confidential Language Modelling
Xuandong Zhao
Lei Li
Yu-Xiang Wang
MU
14
15
0
04 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
58
797
0
14 Apr 2022
InCoder: A Generative Model for Code Infilling and Synthesis
Daniel Fried
Armen Aghajanyan
Jessy Lin
Sida I. Wang
Eric Wallace
Freda Shi
Ruiqi Zhong
Wen-tau Yih
Luke Zettlemoyer
M. Lewis
SyDa
22
625
0
12 Apr 2022
Mix and Match: Learning-free Controllable Text Generation using Energy Language Models
Fatemehsadat Mireshghallah
Kartik Goyal
Taylor Berg-Kirkpatrick
30
78
0
24 Mar 2022
Do Language Models Plagiarize?
Jooyoung Lee
Thai Le
Jinghui Chen
Dongwon Lee
20
73
0
15 Mar 2022
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks
Fatemehsadat Mireshghallah
Kartik Goyal
Archit Uniyal
Taylor Berg-Kirkpatrick
Reza Shokri
MIALM
25
151
0
08 Mar 2022
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
346
0
13 Oct 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
240
590
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,812
0
14 Dec 2020
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?
Gavin Brown
Mark Bun
Vitaly Feldman
Adam D. Smith
Kunal Talwar
245
80
0
11 Dec 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
406
2,584
0
03 Sep 2019
Previous
1
2