ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.06539
  4. Cited By
Deduplicating Training Data Mitigates Privacy Risks in Language Models

Deduplicating Training Data Mitigates Privacy Risks in Language Models

14 February 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
    PILM
    MU
ArXivPDFHTML

Papers citing "Deduplicating Training Data Mitigates Privacy Risks in Language Models"

50 / 62 papers shown
Title
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li
Weijian Ma
Xueyang Li
Yunzhong Lou
G. Zhou
Xiangdong Zhou
32
0
0
07 May 2025
DMRL: Data- and Model-aware Reward Learning for Data Extraction
DMRL: Data- and Model-aware Reward Learning for Data Extraction
Zhiqiang Wang
Ruoxi Cheng
26
0
0
07 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
50
0
0
02 May 2025
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Qianren Mao
Qili Zhang
Hanwen Hao
Zhentao Han
Runhua Xu
...
Bo Li
Y. Song
Jin Dong
Jianxin Li
Philip S. Yu
66
0
0
27 Apr 2025
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
C. L. P. Chen
Daochang Liu
M. Shah
Chang Xu
62
1
0
25 Apr 2025
Measuring Déjà vu Memorization Efficiently
Measuring Déjà vu Memorization Efficiently
Narine Kokhlikyan
Bargav Jayaraman
Florian Bordes
Chuan Guo
Kamalika Chaudhuri
25
1
0
08 Apr 2025
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
Layba Fiaz
Munief Hassan Tahir
Sana Shams
Sarmad Hussain
49
0
0
24 Feb 2025
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Matthieu Meeus
Lukas Wutschitz
Santiago Zanella Béguelin
Shruti Tople
Reza Shokri
75
0
0
24 Feb 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Ivoline Ngong
Swanand Kadhe
Hao Wang
K. Murugesan
Justin D. Weisz
Amit Dhurandhar
K. Ramamurthy
44
2
0
22 Feb 2025
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
M. Russinovich
Ahmed Salem
MU
CLL
57
0
0
20 Feb 2025
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
Dayong Ye
Tainqing Zhu
J. Li
Kun Gao
B. Liu
L. Zhang
Wanlei Zhou
Y. Zhang
AAML
MU
77
0
0
28 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
88
12
0
31 Dec 2024
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Moe Kayali
Fabian Wenz
Nesime Tatbul
Çağatay Demiralp
44
2
0
31 Dec 2024
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
C. L. P. Chen
Daochang Liu
M. Shah
Chang Xu
60
3
0
29 Oct 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
55
3
0
24 Oct 2024
Detecting Training Data of Large Language Models via Expectation Maximization
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
90
3
2
10 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min-Bin Lin
MU
41
5
0
10 Oct 2024
Mitigating Memorization In Language Models
Mitigating Memorization In Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Nathaniel Hudson
Caleb Geniesse
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
KELM
MU
50
0
0
03 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
69
7
0
03 Oct 2024
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Tianle Gu
Kexin Huang
Ruilin Luo
Yuanqi Yao
Yujiu Yang
Yan Teng
Yingchun Wang
MU
26
4
0
18 Sep 2024
Strong Copyright Protection for Language Models via Adaptive Model
  Fusion
Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
37
4
0
29 Jul 2024
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Haoyu Tang
Ye Liu
Xukai Liu
Xukai Liu
Yanghai Zhang
Kai Zhang
Xiaofang Zhou
Enhong Chen
MU
67
3
0
25 Jul 2024
Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law
Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law
Giorgio Franceschelli
Claudia Cevenini
Mirco Musolesi
44
0
0
18 Jul 2024
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Zhenhua Liu
Tong Zhu
Chuanyuan Tan
Wenliang Chen
PILM
MU
39
8
0
14 Jul 2024
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth
Alvin Deng
Kyle O'Brien
Jyothir S V
Mohammad Aflah Khan
...
Jacob Ray Fuehne
Stella Biderman
Tracy Ke
Katherine Lee
Naomi Saphra
55
12
0
25 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
63
4
0
13 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
Reconstructing training data from document understanding models
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
34
1
0
05 Jun 2024
The Mosaic Memory of Large Language Models
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
39
3
0
24 May 2024
A Multi-Perspective Analysis of Memorization in Large Language Models
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
38
1
0
19 May 2024
Understanding (Un)Intended Memorization in Text-to-Image Generative
  Models
Understanding (Un)Intended Memorization in Text-to-Image Generative Models
Ali Naseh
Jaechul Roh
Amir Houmansadr
DiffM
20
6
0
06 Dec 2023
RETSim: Resilient and Efficient Text Similarity
RETSim: Resilient and Efficient Text Similarity
Marina Zhang
Owen Vallis
Aysegul Bumin
Tanay Vakharia
Elie Bursztein
23
1
0
28 Nov 2023
Leveraging Large Language Models for Collective Decision-Making
Leveraging Large Language Models for Collective Decision-Making
Marios Papachristou
Longqi Yang
Chin-Chia Hsu
LLMAG
31
2
0
03 Nov 2023
Privacy Preserving Large Language Models: ChatGPT Case Study Based
  Vision and Framework
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework
Imdad Ullah
Najm Hassan
S. Gill
Basem Suleiman
T. Ahanger
Zawar Shah
Junaid Qadir
S. Kanhere
35
16
0
19 Oct 2023
Quantifying and Analyzing Entity-level Memorization in Large Language
  Models
Quantifying and Analyzing Entity-level Memorization in Large Language Models
Zhenhong Zhou
Jiuyang Xiang
Chao-Yi Chen
Sen Su
PILM
33
8
0
30 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
23
13
0
31 Jul 2023
What can we learn from Data Leakage and Unlearning for Law?
What can we learn from Data Leakage and Unlearning for Law?
Jaydeep Borkar
PILM
MU
23
10
0
19 Jul 2023
Deduplicating and Ranking Solution Programs for Suggesting Reference
  Solutions
Deduplicating and Ranking Solution Programs for Suggesting Reference Solutions
Atsushi Shirafuji
Yutaka Watanobe
19
1
0
16 Jul 2023
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft
  Prompting and Calibrated Confidence Estimation
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang
Jiaxin Wen
Minlie Huang
25
29
0
10 Jul 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
19
116
0
21 Apr 2023
Secret-Keeping in Question Answering
Secret-Keeping in Question Answering
Nathaniel W. Rollings
Kent O'Sullivan
Sakshum Kulshrestha
KELM
24
0
0
16 Mar 2023
Bounding Training Data Reconstruction in DP-SGD
Bounding Training Data Reconstruction in DP-SGD
Jamie Hayes
Saeed Mahloujifar
Borja Balle
AAML
FedML
21
39
0
14 Feb 2023
Extracting Training Data from Diffusion Models
Extracting Training Data from Diffusion Models
Nicholas Carlini
Jamie Hayes
Milad Nasr
Matthew Jagielski
Vikash Sehwag
Florian Tramèr
Borja Balle
Daphne Ippolito
Eric Wallace
DiffM
63
569
0
30 Jan 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Emmanuel Abbe
Samy Bengio
Aryo Lotfi
Kevin Rizk
LRM
28
47
0
30 Jan 2023
Large Language Models Struggle to Learn Long-Tail Knowledge
Large Language Models Struggle to Learn Long-Tail Knowledge
Nikhil Kandpal
H. Deng
Adam Roberts
Eric Wallace
Colin Raffel
RALM
KELM
36
380
0
15 Nov 2022
Synthetic Text Generation with Differential Privacy: A Simple and
  Practical Recipe
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe
Xiang Yue
Huseyin A. Inan
Xuechen Li
Girish Kumar
Julia McAnallen
Hoda Shajari
Huan Sun
David Levitan
Robert Sim
36
79
0
25 Oct 2022
Noise-Robust De-Duplication at Scale
Noise-Robust De-Duplication at Scale
Emily Silcock
Luca DÁmico-Wong
Jinglin Yang
Melissa Dell
SyDa
26
20
0
09 Oct 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
  Model
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gökhan Tür
Premkumar Natarajan
39
82
0
02 Aug 2022
Measuring Forgetting of Memorized Training Examples
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski
Om Thakkar
Florian Tramèr
Daphne Ippolito
Katherine Lee
...
Eric Wallace
Shuang Song
Abhradeep Thakurta
Nicolas Papernot
Chiyuan Zhang
TDI
40
102
0
30 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
43
2,333
0
15 Jun 2022
12
Next