Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.04401
Cited By
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
10 July 2023
Zhexin Zhang
Jiaxin Wen
Minlie Huang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"
22 / 22 papers shown
Title
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Zhexin Zhang
Leqi Lei
Junxiao Yang
Xijie Huang
Yida Lu
...
Xianqi Lei
C. Pan
Lei Sha
H. Wang
Minlie Huang
AAML
43
0
0
24 Feb 2025
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks
Changyue Jiang
Xudong Pan
Geng Hong
Chenfu Bao
Min Yang
SILM
72
7
0
21 Nov 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
34
0
0
25 Oct 2024
PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs
K. K. Nakka
Ahmed Frikha
Ricardo Mendes
Xue Jiang
Xuebing Zhou
24
1
0
09 Oct 2024
MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and Defense
Yixiang Qiu
Hongyao Yu
Hao Fang
Wenbo Yu
Wenbo Yu
Bin Chen
Shu-Tao Xia
Ke Xu
Ke Xu
AAML
22
1
0
07 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
69
7
0
03 Oct 2024
PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
K. K. Nakka
Ahmed Frikha
Ricardo Mendes
Xue Jiang
Xuebing Zhou
24
7
0
03 Jul 2024
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Zhexin Zhang
Junxiao Yang
Pei Ke
Shiyao Cui
Chujie Zheng
Hongning Wang
Minlie Huang
AAML
MU
31
24
0
03 Jul 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
52
16
0
12 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
38
16
0
08 Jun 2024
Exploring the Privacy Protection Capabilities of Chinese Large Language Models
Yuqi Yang
Xiaowen Huang
Jitao Sang
ELM
PILM
AILaw
30
1
0
27 Mar 2024
Concerned with Data Contamination? Assessing Countermeasures in Code Language Model
Jialun Cao
Wuqi Zhang
S. Cheung
14
15
0
25 Mar 2024
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Zhexin Zhang
Yida Lu
Jingyuan Ma
Di Zhang
Rui Li
...
Hao-Lun Sun
Lei Sha
Zhifang Sui
Hongning Wang
Minlie Huang
18
26
0
26 Feb 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
24
463
0
04 Dec 2023
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen
Pei Ke
Hao-Lun Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
21
24
0
29 Nov 2023
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhexin Zhang
Junxiao Yang
Pei Ke
Fei Mi
Hongning Wang
Minlie Huang
AAML
16
113
0
15 Nov 2023
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks
Xiaoyi Chen
Siyuan Tang
Rui Zhu
Shijun Yan
Lei Jin
Zihao Wang
Liya Su
Zhikun Zhang
XiaoFeng Wang
Haixu Tang
AAML
PILM
11
16
0
24 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
39
0
16 Oct 2023
Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey
Victoria Smith
Ali Shahin Shamsabadi
Carolyn Ashurst
Adrian Weller
PILM
27
24
0
27 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Minlie Huang
LRM
LM&MA
ELM
16
87
0
13 Sep 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
1