ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.17035
  4. Cited By
Scalable Extraction of Training Data from (Production) Language Models

Scalable Extraction of Training Data from (Production) Language Models

28 November 2023
Milad Nasr
Nicholas Carlini
Jonathan Hayase
Matthew Jagielski
A. Feder Cooper
Daphne Ippolito
Christopher A. Choquette-Choo
Eric Wallace
Florian Tramèr
Katherine Lee
    SILM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Scalable Extraction of Training Data from (Production) Language Models"

50 / 281 papers shown
Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
Kunj Joshi
David A. Smith
48
0
0
02 Dec 2025
Quantifying the Privacy Implications of High-Fidelity Synthetic Network Traffic
Quantifying the Privacy Implications of High-Fidelity Synthetic Network Traffic
Van-Tai Tran
Shinan Liu
Tian Li
Nick Feamster
MIACV
523
0
0
25 Nov 2025
For Those Who May Find Themselves on the Red Team
For Those Who May Find Themselves on the Red Team
Tyler Shoemaker
34
0
0
23 Nov 2025
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Leak@kkk: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Hadi Reisizadeh
Jiajun Ruan
Yiwei Chen
Soumyadeep Pal
Sijia Liu
Mingyi Hong
MU
352
0
0
07 Nov 2025
Remembering Unequally: Global and Disciplinary Bias in LLM-Generated Co-Authorship Networks
Remembering Unequally: Global and Disciplinary Bias in LLM-Generated Co-Authorship Networks
Ghazal Kalhor
Afra Mashhadi
74
0
0
01 Nov 2025
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
André V. Duarte
Xuying Li
Bin Zeng
Arlindo L. Oliveira
Lei Li
Zhuo Li
127
0
0
29 Oct 2025
PrivacyGuard: A Modular Framework for Privacy Auditing in Machine Learning
PrivacyGuard: A Modular Framework for Privacy Auditing in Machine Learning
Luca Melis
Matthew Grange
Iden Kalemaj
Karan Chadha
Shengyuan Hu
Elena Kashtelyan
Will Bullock
132
0
0
27 Oct 2025
Leverage Unlearning to Sanitize LLMs
Leverage Unlearning to Sanitize LLMs
Antoine Boutet
Lucas Magnana
MUMedIm
193
0
0
24 Oct 2025
CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage
CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage
Nowfel Mashnoor
Mohammad Akyash
Hadi M Kamali
Kimia Azar
122
0
0
22 Oct 2025
Extracting alignment data in open models
Extracting alignment data in open models
Federico Barbero
Xiangming Gu
Christopher A. Choquette-Choo
Chawin Sitawarin
Matthew Jagielski
Itay Yona
Petar Velickovic
Ilia Shumailov
Jamie Hayes
183
1
0
21 Oct 2025
An Investigation of Memorization Risk in Healthcare Foundation Models
An Investigation of Memorization Risk in Healthcare Foundation Models
S. Tonekaboni
Lena Stempfle
Adibvafa Fallahpour
Walter Gerych
Elisa Kreiss
114
0
0
14 Oct 2025
The Model's Language Matters: A Comparative Privacy Analysis of LLMs
The Model's Language Matters: A Comparative Privacy Analysis of LLMs
Abhishek K. Mishra
Antoine Boutet
Lucas Magnana
PILM
270
0
0
09 Oct 2025
On the Theory of Continual Learning with Gradient Descent for Neural Networks
On the Theory of Continual Learning with Gradient Descent for Neural Networks
Hossein Taheri
Avishek Ghosh
Arya Mazumdar
CLL
151
0
0
07 Oct 2025
Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique
Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique
Yanming Li
Seifeddine Ghozzi
Cédric Eichler
Nicolas Anciaux
Alexandra Bensamoun
Lorena Gonzalez-Manzano
WaLM
213
0
0
07 Oct 2025
External Data Extraction Attacks against Retrieval-Augmented Large Language Models
External Data Extraction Attacks against Retrieval-Augmented Large Language Models
Yu He
Yihao Chen
Y. Li
Shuo Shao
Leyi Qi
Boheng Li
Dacheng Tao
Zhan Qin
AAMLSILM
275
1
0
03 Oct 2025
UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models
UpSafe∘^\circ∘C: Upcycling for Controllable Safety in Large Language Models
Yuhao Sun
Zhuoer Xu
Shiwen Cui
Kun Yang
Lingyun Yu
Yongdong Zhang
Hongtao Xie
KELM
88
0
0
02 Oct 2025
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Manjiang Yu
Priyanka Singh
Xue Li
Yang Cao
AAML
136
0
0
27 Sep 2025
Federated Learning of Quantile Inference under Local Differential Privacy
Federated Learning of Quantile Inference under Local Differential Privacy
Leheng Cai
Qirui Hu
Shuyuan Wu
FedML
108
0
0
26 Sep 2025
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Yehonatan Refael
Guy Smorodinsky
Ofir Lindenbaum
Itay Safran
MIACVAAML
305
0
0
25 Sep 2025
GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models
GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models
Jieli Zhu
Vi Ngoc-Nha Tran
226
0
0
25 Sep 2025
Enterprise AI Must Enforce Participant-Aware Access Control
Enterprise AI Must Enforce Participant-Aware Access Control
Shashank Shreedhar Bhatt
Tanmay Rajore
Khushboo Aggarwal
Ganesh Ananthanarayanan
Ranveer Chandra
...
Emre Kiciman
Sumit Kumar Pandey
Srinath T. V. Setty
Rahul Sharma
Teijia Zhao
AAMLSILM
218
1
0
18 Sep 2025
AI-Generated Content in Cross-Domain Applications: Research Trends, Challenges and Propositions
AI-Generated Content in Cross-Domain Applications: Research Trends, Challenges and PropositionsKnowledge-Based Systems (KBS), 2025
Jianxin Li
Liang Qu
Taotao Cai
Zhixue Zhao
Nur Al Hasan Haldar
...
Karen Blackmore
Nasimul Noman
Jingxian Cheng
Ningning Cui
Jianliang Xu
171
1
0
14 Sep 2025
A Biosecurity Agent for Lifecycle LLM Biosecurity Alignment
A Biosecurity Agent for Lifecycle LLM Biosecurity Alignment
Meiyin Meng
Zaixi Zhang
LLMAG
165
0
0
13 Sep 2025
Why Data Anonymization Has Not Taken Off
Why Data Anonymization Has Not Taken OffCustomer Needs and Solutions (CNS), 2025
Matthew J. Schneider
James Bailie
Dawn Iacobucci
191
1
0
12 Sep 2025
User Privacy and Large Language Models: An Analysis of Frontier Developers' Privacy Policies
User Privacy and Large Language Models: An Analysis of Frontier Developers' Privacy Policies
Jennifer King
Kevin Klyman
Emily Capstick
Tiffany Saade
Victoria Hsieh
SILM
132
3
0
05 Sep 2025
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Naman D. Singh
Maximilian Müller
Francesco Croce
Matthias Hein
MUKELMCLL
192
4
0
02 Sep 2025
Clone What You Can't Steal: Black-Box LLM Replication via Logit Leakage and Distillation
Clone What You Can't Steal: Black-Box LLM Replication via Logit Leakage and Distillation
Kanchon Gharami
Hansaka Aluvihare
Shafika Showkat Moni
Berker Peköz
98
1
0
31 Aug 2025
Embodied AI: Emerging Risks and Opportunities for Policy Action
Embodied AI: Emerging Risks and Opportunities for Policy Action
Jared Perlo
Alexander Robey
Fazl Barez
Luciano Floridi
Jakob Mokander
290
2
0
28 Aug 2025
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
Qiming Guo
Jinwen Tang
Xingran Huang
156
1
0
25 Aug 2025
On the Edge of Memorization in Diffusion Models
On the Edge of Memorization in Diffusion Models
Sam Buchanan
Druv Pai
Yi-An Ma
Valentin De Bortoli
TDI
276
3
0
25 Aug 2025
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation
Yichi Zhang
Yao Huang
Yifan Wang
Yitong Sun
Chang-rui Liu
...
Xiao Yang
Xingxing Wei
Hang Su
Yinpeng Dong
Jun Zhu
161
1
0
21 Aug 2025
A Study of Privacy-preserving Language Modeling Approaches
A Study of Privacy-preserving Language Modeling Approaches
Pritilata Saha
Abhirup Sinha
PILM
236
0
0
21 Aug 2025
Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous
Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous
Ben Nassi
Stav Cohen
Or Yair
127
3
0
16 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAMLSILM
186
0
0
14 Aug 2025
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
Skyler Hallinan
Jaehun Jung
Melanie Sclar
Ximing Lu
Abhilasha Ravichander
Sahana Ramnath
Yejin Choi
Sai Praneeth Karimireddy
Niloofar Mireshghallah
Xiang Ren
AAMLMLAU
304
2
0
13 Aug 2025
PETLP: A Privacy-by-Design Pipeline for Social Media Data in AI Research
PETLP: A Privacy-by-Design Pipeline for Social Media Data in AI Research
Nick Oh
Giorgos D. Vrakas
Siân J. M. Brooke
Sasha Morinière
Toju Duke
AILaw
177
0
0
12 Aug 2025
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
Badrinath Ramakrishnan
Akshaya Balaji
MUPILM
283
1
0
10 Aug 2025
Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy
Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy
Jairo Gudiño-Rosero
Clément Contet
Umberto Grandi
César A. Hidalgo
AAMLSILM
199
0
0
06 Aug 2025
Current State in Privacy-Preserving Text Preprocessing for Domain-Agnostic NLP
Current State in Privacy-Preserving Text Preprocessing for Domain-Agnostic NLP
Abhirup Sinha
Pritilata Saha
Tithi Saha
AILaw
100
0
0
05 Aug 2025
Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs
Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs
Jérémie Dentan
Davide Buscaldi
Sonia Vanier
191
0
0
04 Aug 2025
Bridging AI Innovation and Healthcare Needs: Lessons Learned from Incorporating Modern NLP at The BC Cancer Registry
Bridging AI Innovation and Healthcare Needs: Lessons Learned from Incorporating Modern NLP at The BC Cancer Registry
Lovedeep Gondara
Gregory Arbour
Raymond Ng
Jonathan Simkin
Shebnum Devji
129
0
0
27 Jul 2025
Differentiating hype from practical applications of large language models in medicine - a primer for healthcare professionals
Differentiating hype from practical applications of large language models in medicine - a primer for healthcare professionals
Elisha D.O. Roberson
LM&MA
88
0
0
25 Jul 2025
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
Pengfei Du
AAML
148
2
0
14 Jul 2025
Memorization Sinks: Isolating Memorization during LLM Training
Memorization Sinks: Isolating Memorization during LLM Training
Gaurav R. Ghosal
Pratyush Maini
Aditi Raghunathan
MU
240
4
0
14 Jul 2025
PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
Krishna Kanth Nakka
Xue Jiang
Dmitrii Usynin
Xuebing Zhou
LLMSV
249
0
0
03 Jul 2025
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy
Vishnu Vinod
Krishna Pillutla
Abhradeep Thakurta
SILMSyDa
152
1
0
30 Jun 2025
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset
Rachel Hong
Jevan Hutson
William Agnew
Imaad Huda
Tadayoshi Kohno
Jamie Morgenstern
AILaw
350
3
0
20 Jun 2025
Approximating Language Model Training Data from Weights
Approximating Language Model Training Data from Weights
John X. Morris
Junjie Oscar Yin
Woojeong Kim
Vitaly Shmatikov
Alexander M. Rush
260
2
0
18 Jun 2025
SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation
SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and MitigationACM Asia Conference on Computer and Communications Security (AsiaCCS), 2025
Yashothara Shanmugarasa
Ming Ding
M. Chamikara
Thierry Rakotoarivelo
PILMAILaw
436
10
0
15 Jun 2025
Memorization in Language Models through the Lens of Intrinsic Dimension
Memorization in Language Models through the Lens of Intrinsic Dimension
Stefan Arnold
PILM
321
1
0
11 Jun 2025
123456
Next