Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.07646
Cited By
Quantifying Memorization Across Neural Language Models
15 February 2022
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantifying Memorization Across Neural Language Models"
31 / 131 papers shown
Title
ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models
Jonas Belouadi
Steffen Eger
51
24
0
20 Dec 2022
Event knowledge in large language models: the gap between the impossible and the unlikely
Carina Kauf
Anna A. Ivanova
Giulia Rambelli
Emmanuele Chersoni
Jingyuan Selena She
Zawad Chowdhury
Evelina Fedorenko
Alessandro Lenci
37
67
0
02 Dec 2022
Validating Large Language Models with ReLM
Michael Kuchnik
Virginia Smith
George Amvrosiadis
24
27
0
21 Nov 2022
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe
Xiang Yue
Huseyin A. Inan
Xuechen Li
Girish Kumar
Julia McAnallen
Hoda Shajari
Huan Sun
David Levitan
Robert Sim
38
79
0
25 Oct 2022
Exploring Mode Connectivity for Pre-trained Language Models
Yujia Qin
Cheng Qian
Jing Yi
Weize Chen
Yankai Lin
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
29
20
0
25 Oct 2022
Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks
Vikas Raunak
Arul Menezes
32
13
0
24 Oct 2022
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Albert Q. Jiang
Sean Welleck
Jin Peng Zhou
Wenda Li
Jiacheng Liu
M. Jamnik
Timothée Lacroix
Yuhuai Wu
Guillaume Lample
AIMat
67
157
0
21 Oct 2022
Self-Repetition in Abstractive Neural Summarizers
Nikita Salkar
T. Trikalinos
Byron C. Wallace
A. Nenkova
10
10
0
14 Oct 2022
Mitigating Unintended Memorization in Language Models via Alternating Teaching
Zhe Liu
Xuedong Zhang
Fuchun Peng
24
3
0
13 Oct 2022
Noise-Robust De-Duplication at Scale
Emily Silcock
Luca DÁmico-Wong
Jinglin Yang
Melissa Dell
SyDa
31
20
0
09 Oct 2022
Understanding Transformer Memorization Recall Through Idioms
Adi Haviv
Ido Cohen
Jacob Gidron
R. Schuster
Yoav Goldberg
Mor Geva
28
48
0
07 Oct 2022
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Tegan Maharaj
David M. Krueger
Sara Hooker
37
27
0
20 Sep 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gökhan Tür
Premkumar Natarajan
48
82
0
02 Aug 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
97
0
01 Jul 2022
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski
Om Thakkar
Florian Tramèr
Daphne Ippolito
Katherine Lee
...
Eric Wallace
Shuang Song
Abhradeep Thakurta
Nicolas Papernot
Chiyuan Zhang
TDI
50
102
0
30 Jun 2022
GitHub Copilot AI pair programmer: Asset or Liability?
Arghavan Moradi Dakhel
Vahid Majdinasab
Amin Nikanjam
Foutse Khomh
Michel C. Desmarais
Zhen Ming
Z. Jiang
26
331
0
30 Jun 2022
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLM
ELM
LRM
56
739
0
29 Jun 2022
Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias
Yarden Tal
Inbal Magar
Roy Schwartz
17
33
0
20 Jun 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
48
2,333
0
15 Jun 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
T. Shamardina
Vladislav Mikhailov
Daniil Chernianskii
Alena Fenogenova
Marat Saidov
A. Valeeva
Tatiana Shavrina
I. Smurov
E. Tutubalina
Ekaterina Artemova
DeLMO
16
30
0
03 Jun 2022
TALM: Tool Augmented Language Models
Aaron T Parisi
Yao-Min Zhao
Noah Fiedel
KELM
RALM
LLMAG
27
144
0
24 May 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
26
185
0
22 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
63
800
0
14 Apr 2022
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
Florian Tramèr
Reza Shokri
Ayrton San Joaquin
Hoang Minh Le
Matthew Jagielski
Sanghyun Hong
Nicholas Carlini
MIACV
33
106
0
31 Mar 2022
Do Language Models Plagiarize?
Jooyoung Lee
Thai Le
Jinghui Chen
Dongwon Lee
27
73
0
15 Mar 2022
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks
Fatemehsadat Mireshghallah
Kartik Goyal
Archit Uniyal
Taylor Berg-Kirkpatrick
Reza Shokri
MIALM
30
151
0
08 Mar 2022
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
28
274
0
14 Feb 2022
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
22
119
0
18 Nov 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
592
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,814
0
14 Dec 2020
Previous
1
2
3