All Papers

0 / 0 papers shown

Title

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
Injection

v1v2v3 (latest)

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

North American Chapter of the Association for Computational Linguistics (NAACL), 2023

31 July 2023

Vikas Yadav

Vijay Srinivasan

Xiang Ren

ArXiv (abs)PDF HTML HuggingFace (7 upvotes)

Papers citing "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"

6 / 106 papers shown

Title
Instruction Backdoor Attacks Against Customized LLMs Rui Zhang Hongwei Li Rui Wen Wenbo Jiang Yuan Zhang Michael Backes Yun Shen Yang Zhang AAML SILM 352 60 0 14 Feb 2024
Preference Poisoning Attacks on Reward Model Learning Junlin Wu Zhenghao Hu Chaowei Xiao Chenguang Wang Ning Zhang Yevgeniy Vorobeychik AAML 263 11 0 02 Feb 2024
Weak-to-Strong Jailbreaking on Large Language Models Xuandong Zhao Xianjun Yang Tianyu Pang Chao Du Lei Li Yu-Xiang Wang William Y. Wang 874 88 0 30 Jan 2024
Maatphor: Automated Variant Analysis for Prompt Injection Attacks Ahmed Salem Andrew Paverd Boris Köpf 242 17 0 12 Dec 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions Haoran Li Yulin Chen Jinglong Luo Weijing Chen Xiaojin Zhang Qi Hu Chunkit Chan Yangqiu Song PILM 419 64 0 16 Oct 2023
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security ReviewIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023 Pengzhou Cheng Zongru Wu Wei Du Haodong Zhao Wei Lu Gongshen Liu SILM AAML 661 45 0 12 Sep 2023