Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

Annual Meeting of the Association for Computational Linguistics (ACL), 2023

10 July 2023

Zhexin Zhang

Jiaxin Wen

Shiyu Huang

ArXiv (abs)PDF HTML Github (23★)

Papers citing "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"

24 / 24 papers shown

CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

309

13 Oct 2025

Current State in Privacy-Preserving Text Preprocessing for Domain-Agnostic NLP

163

05 Aug 2025

SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and MitigationACM Asia Conference on Computer and Communications Security (AsiaCCS), 2025

Yashothara Shanmugarasa

Ming Ding

M. Chamikara

Thierry Rakotoarivelo

PILM AILaw

555

15 Jun 2025

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Minlie Huang

396

21 May 2025

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

Zhexin Zhang

Xian Qi Loye

Victor Shea-Jay Huang

...

403

21 May 2025

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

...

283

24 Feb 2025

R.R.: Unveiling LLM Training Privacy through Recollection and RankingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

400

18 Feb 2025

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

425

25 Oct 2024

MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and Defense

Hao Fang

Bin Chen

339

07 Oct 2024

Undesirable Memorization in Large Language Models: A Survey

715

03 Oct 2024

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

502

03 Jul 2024

Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey

490

12 Jun 2024

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

...

Lu Cheng

354

08 Jun 2024

Exploring the Privacy Protection Capabilities of Chinese Large Language Models

256

27 Mar 2024

Concerned with Data Contamination? Assessing Countermeasures in Code Language Model

Jialun Cao

Wuqi Zhang

Shing-Chi Cheung

472

25 Mar 2024

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

...

Lei Sha

Zhifang Sui

Hongning Wang

Shiyu Huang

147

26 Feb 2024

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the UglyHigh-Confidence Computing (HC), 2023

669

1,127

04 Dec 2023

Unveiling the Implicit Toxicity in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

180

29 Nov 2023

Defending Large Language Models Against Jailbreaking Attacks Through Goal PrioritizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

487

193

15 Nov 2023

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy RisksConference on Computer and Communications Security (CCS), 2023

271

24 Oct 2023

Privacy in Large Language Models: Attacks, Defenses and Future Directions

512

16 Oct 2023

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Victoria Smith

Ali Shahin Shamsabadi

Carolyn Ashurst

Adrian Weller

PILM

539

27 Sep 2023

SafetyBench: Evaluating the Safety of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xiao Liu

383

196

13 Sep 2023

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence EstimationNeural Information Processing Systems (NeurIPS), 2020

Vitaly Feldman

Chiyuan Zhang

TDI

720

602

09 Aug 2020