Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots

28 October 2023

Papers citing "Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots"

9 / 9 papers shown

Title
When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations Huaizhi Ge Yiming Li Qifan Wang Yongfeng Zhang Ruixiang Tang AAML SILM 72 0 0 19 Nov 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits Andis Draguns Andrew Gritsevskiy S. Motwani Charlie Rogers-Smith Jeffrey Ladish Christian Schroeder de Witt 40 2 0 03 Jun 2024
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger Jiazhao Li Yijin Yang Zhuofeng Wu V. Vydiswaran Chaowei Xiao SILM 44 42 0 27 Apr 2023
A Study of the Attention Abnormality in Trojaned BERTs Weimin Lyu Songzhu Zheng Teng Ma Chao Chen 51 56 0 13 May 2022
Few-Shot Backdoor Attacks on Visual Object Tracking Yiming Li Haoxiang Zhong Xingjun Ma Yong Jiang Shutao Xia AAML 34 53 0 31 Jan 2022
Measure and Improve Robustness in NLP Models: A Survey Xuezhi Wang Haohan Wang Diyi Yang 139 130 0 15 Dec 2021
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer Fanchao Qi Yangyi Chen Xurui Zhang Mukai Li Zhiyuan Liu Maosong Sun AAML SILM 77 175 0 14 Oct 2021
Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs Kuan-Hao Huang Kai-Wei Chang 148 68 0 26 Jan 2021
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification Chuanshuai Chen Jiazhu Dai SILM 53 126 0 11 Jul 2020