Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.17194
Cited By
On the Exploitability of Instruction Tuning
28 June 2023
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Exploitability of Instruction Tuning"
22 / 72 papers shown
Title
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
17
101
0
30 Jan 2024
One-Shot Learning as Instruction Data Prospector for Large Language Models
Yunshui Li
Binyuan Hui
Xiaobo Xia
Jiaxi Yang
Min Yang
...
Ling-Hao Chen
Junhao Liu
Tongliang Liu
Fei Huang
Yongbin Li
33
31
0
16 Dec 2023
Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack
Yu Fu
Yufei Li
Wen Xiao
Cong Liu
Yue Dong
AAML
29
5
0
12 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
24
468
0
04 Dec 2023
The Philosopher's Stone: Trojaning Plugins of Large Language Models
Tian Dong
Minhui Xue
Guoxing Chen
Rayne Holland
Shaofeng Li
Yan Meng
Zhen Liu
Haojin Zhu
AAML
13
9
0
01 Dec 2023
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu
Yichen Zhu
Jindong Gu
Yunshi Lan
Chao Yang
Yu Qiao
19
80
0
29 Nov 2023
MoDS: Model-oriented Data Selection for Instruction Tuning
Qianlong Du
Chengqing Zong
Jiajun Zhang
ALM
24
75
0
27 Nov 2023
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
13
12
0
16 Nov 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu
Ruiyi Zhang
Bang An
Gang Wu
Joe Barrow
Zichao Wang
Furong Huang
A. Nenkova
Tong Sun
SILM
AAML
30
40
0
23 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
40
0
16 Oct 2023
Misusing Tools in Large Language Models With Visual Adversarial Examples
Xiaohan Fu
Zihan Wang
Shuheng Li
Rajesh K. Gupta
Niloofar Mireshghallah
Taylor Berg-Kirkpatrick
Earlence Fernandes
AAML
15
24
0
04 Oct 2023
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models
Xianjun Yang
Xiao Wang
Qi Zhang
Linda R. Petzold
William Yang Wang
Xun Zhao
Dahua Lin
18
160
0
04 Oct 2023
The Poison of Alignment
Aibek Bekbayev
Sungbae Chun
Yerzat Dulat
James Yamazaki
18
9
0
25 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
31
75
0
24 Aug 2023
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li
Baolin Peng
Pengcheng He
Xifeng Yan
ELM
SILM
AAML
33
23
0
17 Aug 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
15
75
0
31 Jul 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
90
124
0
01 May 2023
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
Jiazhao Li
Yijin Yang
Zhuofeng Wu
V. Vydiswaran
Chaowei Xiao
SILM
41
42
0
27 Apr 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
579
0
06 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
205
1,654
0
15 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,835
0
18 Apr 2021
Previous
1
2