Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2307.16888
Cited By
v1
v2
v3 (latest)
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
31 July 2023
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (7 upvotes)
Papers citing
"Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
6 / 106 papers shown
Title
Instruction Backdoor Attacks Against Customized LLMs
Rui Zhang
Hongwei Li
Rui Wen
Wenbo Jiang
Yuan Zhang
Michael Backes
Yun Shen
Yang Zhang
AAML
SILM
352
60
0
14 Feb 2024
Preference Poisoning Attacks on Reward Model Learning
Junlin Wu
Zhenghao Hu
Chaowei Xiao
Chenguang Wang
Ning Zhang
Yevgeniy Vorobeychik
AAML
263
11
0
02 Feb 2024
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Y. Wang
874
88
0
30 Jan 2024
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem
Andrew Paverd
Boris Köpf
242
17
0
12 Dec 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Weijing Chen
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
419
64
0
16 Oct 2023
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
661
45
0
12 Sep 2023
Previous
1
2
3