ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.10369
  4. Cited By
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

20 November 2020
Fanchao Qi
Yangyi Chen
Mukai Li
Yuan Yao
Zhiyuan Liu
Maosong Sun
    AAML
ArXivPDFHTML

Papers citing "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"

50 / 164 papers shown
Title
Securing Multi-turn Conversational Language Models Against Distributed
  Backdoor Triggers
Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers
Terry Tong
Jiashu Xu
Qin Liu
Muhao Chen
AAML
SILM
37
1
0
04 Jul 2024
SOS! Soft Prompt Attack Against Open-Source Large Language Models
SOS! Soft Prompt Attack Against Open-Source Large Language Models
Ziqing Yang
Michael Backes
Yang Zhang
Ahmed Salem
AAML
38
6
0
03 Jul 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in
  Instruction-tuned Language Models
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Yi Zeng
Weiyu Sun
Tran Ngoc Huynh
Dawn Song
Bo Li
Ruoxi Jia
AAML
LLMSV
35
17
0
24 Jun 2024
Attack and Defense of Deep Learning Models in the Field of Web Attack
  Detection
Attack and Defense of Deep Learning Models in the Field of Web Attack Detection
Lijia Shi
Shihao Dong
AAML
25
0
0
18 Jun 2024
FamiCom: Further Demystifying Prompts for Language Models with
  Task-Agnostic Performance Estimation
FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation
Bangzheng Li
Ben Zhou
Xingyu Fu
Fei Wang
Dan Roth
Muhao Chen
26
3
0
17 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A
  Comprehensive Survey
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
57
17
0
12 Jun 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
37
11
0
10 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
22
6
0
09 Jun 2024
PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning
PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning
Tianrong Zhang
Zhaohan Xi
Ting Wang
Prasenjit Mitra
Jinghui Chen
AAML
SILM
27
2
0
06 Jun 2024
Exploring Vulnerabilities and Protections in Large Language Models: A
  Survey
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu
Chenhui Hu
AAML
37
7
0
01 Jun 2024
TrojFM: Resource-efficient Backdoor Attacks against Very Large
  Foundation Models
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Yuzhou Nie
Yanting Wang
Jinyuan Jia
Michael J. De Lucia
Nathaniel D. Bastian
Wenbo Guo
Dawn Song
SILM
AAML
34
5
0
27 May 2024
SEEP: Training Dynamics Grounds Latent Representation Search for
  Mitigating Backdoor Poisoning Attacks
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks
Xuanli He
Qiongkai Xu
Jun Wang
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
35
4
0
19 May 2024
BadActs: A Universal Backdoor Defense in the Activation Space
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
33
5
0
18 May 2024
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Hwiyeol Jo
Taiwoo Park
Nayoung Choi
Changbong Kim
Ohjoon Kwon
...
Kyoungho Shin
Sun Suk Lim
Kyungmi Kim
Jihye Lee
Sun Kim
60
0
0
05 Apr 2024
Exploring Backdoor Vulnerabilities of Chat Models
Exploring Backdoor Vulnerabilities of Chat Models
Yunzhuo Hao
Wenkai Yang
Yankai Lin
SILM
KELM
21
9
0
03 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against
  Multi-Backdoors
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
27
8
0
02 Apr 2024
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
Weimin Lyu
Xiao Lin
Songzhu Zheng
Lu Pang
Haibin Ling
Susmit Jha
Chao Chen
43
25
0
25 Mar 2024
$\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on
  Prompt-based Language Models
LinkPrompt\textit{LinkPrompt}LinkPrompt: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Yue Xu
Wenjie Wang
SILM
AAML
26
2
0
25 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible
  Practices
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
65
24
0
19 Mar 2024
Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
Ansh Arora
Xuanli He
Maximilian Mozes
Srinibas Swain
Mark Dras
Qiongkai Xu
SILM
MoMe
AAML
56
12
0
29 Feb 2024
On Trojan Signatures in Large Language Models of Code
On Trojan Signatures in Large Language Models of Code
Aftab Hussain
Md Rafiqul Islam Rabin
Mohammad Amin Alipour
41
2
0
23 Feb 2024
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts
  Against Open-source LLMs
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs
Xiaoxia Li
Siyuan Liang
Jiyi Zhang
Hansheng Fang
Aishan Liu
Ee-Chien Chang
90
24
0
21 Feb 2024
Learning to Poison Large Language Models During Instruction Tuning
Learning to Poison Large Language Models During Instruction Tuning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Mohammad Amin Roshani
Douglas Zytko
Dongxiao Zhu
AAML
SILM
32
20
0
21 Feb 2024
Defending Against Weight-Poisoning Backdoor Attacks for
  Parameter-Efficient Fine-Tuning
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Leilei Gan
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
21
22
0
19 Feb 2024
Purifying Large Language Models by Ensembling a Small Language Model
Purifying Large Language Models by Ensembling a Small Language Model
Tianlin Li
Qian Liu
Tianyu Pang
Chao Du
Qing-Wu Guo
Yang Liu
Min-Bin Lin
46
16
0
19 Feb 2024
Acquiring Clean Language Models from Backdoor Poisoned Datasets by
  Downscaling Frequency Space
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
Zongru Wu
Zhuosheng Zhang
Pengzhou Cheng
Gongshen Liu
AAML
39
4
0
19 Feb 2024
Instruction Backdoor Attacks Against Customized LLMs
Instruction Backdoor Attacks Against Customized LLMs
Rui Zhang
Hongwei Li
Rui Wen
Wenbo Jiang
Yuan Zhang
Michael Backes
Yun Shen
Yang Zhang
AAML
SILM
30
21
0
14 Feb 2024
OrderBkd: Textual backdoor attack through repositioning
OrderBkd: Textual backdoor attack through repositioning
Irina Alekseevskaia
Konstantin Arkhipenko
17
2
0
12 Feb 2024
Single Word Change is All You Need: Designing Attacks and Defenses for
  Text Classifiers
Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers
Lei Xu
Sarah Alnegheimish
Laure Berti-Equille
Alfredo Cuesta-Infante
K. Veeramachaneni
AAML
19
0
0
30 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
19
101
0
30 Jan 2024
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for
  In-context Learning
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
Shuai Zhao
Meihuizi Jia
Anh Tuan Luu
Fengjun Pan
Jinming Wen
AAML
29
35
0
11 Jan 2024
A Comprehensive Survey of Attack Techniques, Implementation, and
  Mitigation Strategies in Large Language Models
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
30
11
0
18 Dec 2023
TrojFSP: Trojan Insertion in Few-shot Prompt Tuning
TrojFSP: Trojan Insertion in Few-shot Prompt Tuning
Meng Zheng
Jiaqi Xue
Xun Chen
YanShan Wang
Qian Lou
Lei Jiang
AAML
23
7
0
16 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good,
  the Bad, and the Ugly
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
29
473
0
04 Dec 2023
Grounding Foundation Models through Federated Transfer Learning: A
  General Framework
Grounding Foundation Models through Federated Transfer Learning: A General Framework
Yan Kang
Tao Fan
Hanlin Gu
Xiaojin Zhang
Lixin Fan
Qiang Yang
AI4CE
68
19
0
29 Nov 2023
TextGuard: Provable Defense against Backdoor Attacks on Text
  Classification
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Hengzhi Pei
Jinyuan Jia
Wenbo Guo
Bo-wen Li
Dawn Song
SILM
21
9
0
19 Nov 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Dongxiao Zhu
30
32
0
16 Nov 2023
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Wenjie Mo
Jiashu Xu
Qin Liu
Jiong Wang
Jun Yan
Chaowei Xiao
Muhao Chen
Muhao Chen
AAML
54
17
0
16 Nov 2023
Setting the Trap: Capturing and Defeating Backdoors in Pretrained
  Language Models through Honeypots
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots
Ruixiang Tang
Jiayi Yuan
Yiming Li
Zirui Liu
Rui Chen
Xia Hu
AAML
31
13
0
28 Oct 2023
Large Language Models Are Better Adversaries: Exploring Generative
  Clean-Label Backdoor Attacks Against Text Classifiers
Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
Wencong You
Zayd Hammoudeh
Daniel Lowd
AAML
11
12
0
28 Oct 2023
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Weimin Lyu
Songzhu Zheng
Lu Pang
Haibin Ling
Chao Chen
21
34
0
23 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
41
0
16 Oct 2023
Composite Backdoor Attacks Against Large Language Models
Composite Backdoor Attacks Against Large Language Models
Hai Huang
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
17
36
0
11 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
18
21
0
28 Sep 2023
Defending Pre-trained Language Models as Few-shot Learners against
  Backdoor Attacks
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
Zhaohan Xi
Tianyu Du
Changjiang Li
Ren Pang
S. Ji
Jinghui Chen
Fenglong Ma
Ting Wang
AAML
8
29
0
23 Sep 2023
Backdoor Attacks and Countermeasures in Natural Language Processing
  Models: A Comprehensive Security Review
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
18
17
0
12 Sep 2023
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
Chengkun Wei
Wenlong Meng
Zhikun Zhang
M. Chen
Ming-Hui Zhao
Wenjing Fang
Lei Wang
Zihui Zhang
Wenzhi Chen
AAML
13
8
0
26 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and
  Vulnerabilities
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
31
76
0
24 Aug 2023
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal
  Backdoored Models
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
Indranil Sur
Karan Sikka
Matthew Walmer
K. Koneripalli
Anirban Roy
Xiaoyu Lin
Ajay Divakaran
Susmit Jha
24
8
0
07 Aug 2023
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned
  Samples in NLP
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
Lu Yan
Zhuo Zhang
Guanhong Tao
Kaiyuan Zhang
Xuan Chen
Guangyu Shen
Xiangyu Zhang
AAML
SILM
54
16
0
04 Aug 2023
Previous
1234
Next