ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.10369
  4. Cited By
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
v1v2v3 (latest)

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
20 November 2020
Fanchao Qi
Yangyi Chen
Mukai Li
Yuan Yao
Zhiyuan Liu
Maosong Sun
    AAML
ArXiv (abs)PDFHTMLGithub (33★)

Papers citing "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"

50 / 193 papers shown
SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models
SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models
Eric Xue
Ruiyi Zhang
Zijun Zhang
AAML
218
0
0
18 Nov 2025
ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training
ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training
Xin Yao
Haiyang Zhao
Yimin Chen
Jiawei Guo
Kecheng Huang
Ming Zhao
CLIPSILMVLM
383
0
0
01 Nov 2025
Signature in Code Backdoor Detection, how far are we?
Signature in Code Backdoor Detection, how far are we?
Quoc Hung Le
Thanh Le-Cong
Bach Le
Bowen Xu
AAML
111
0
0
15 Oct 2025
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Guanbin Li
Miao Yu
Moayad Aloqaily
Zhenhong Zhou
Kun Wang
Linsey Pang
Prakhar Mehrotra
Qingsong Wen
AAML
108
1
0
11 Oct 2025
Automatic Text Box Placement for Supporting Typographic Design
Automatic Text Box Placement for Supporting Typographic Design
Jun Muraoka
Daichi Haraguchi
Naoto Inoue
Wataru Shimoda
Kota Yamaguchi
Seiichi Uchida
164
2
0
09 Oct 2025
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Alexandra Souly
Javier Rando
Ed Chapman
Xander Davies
Shae McFadden
...
Erik Jones
Chris Hicks
Nicholas Carlini
Y. Gal
Robert Kirk
AAMLSILM
315
36
0
08 Oct 2025
P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
Shuai Zhao
Xinyi Wu
Shiqian Zhao
Xiaobao Wu
Zhongliang Guo
Yanhao Jia
Anh Tuan Luu
AAML
238
1
0
06 Oct 2025
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Anindya Sundar Das
Kangjie Chen
M. Bhuyan
SILMAAML
240
1
0
05 Oct 2025
Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
Yulin Chen
Haoran Li
Yuan Sui
Yangqiu Song
Bryan Hooi
SILMAAML
270
1
0
04 Oct 2025
A Single Character can Make or Break Your LLM Evals
A Single Character can Make or Break Your LLM Evals
Jingtong Su
Jianyu Zhang
Karen Ullrich
Léon Bottou
Mark Ibrahim
153
0
0
02 Oct 2025
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Rui Melo
Rui Abreu
C. Păsăreanu
176
1
0
01 Oct 2025
Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron
Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron
Gejian Zhao
Hanzhou Wu
Xinpeng Zhang
211
0
0
23 Sep 2025
Localizing Malicious Outputs from CodeLLM
Localizing Malicious Outputs from CodeLLM
Mayukh Borana
Junyi Liang
Sai Sathiesh Rajan
Sudipta Chattopadhyay
AAML
141
0
0
21 Sep 2025
Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving
Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving
Xuan Chen
Shiwei Feng
Zikang Xiong
Shengwei An
Yunshu Mao
Lu Yan
Guanhong Tao
Wenbo Guo
Xiangyu Zhang
AAML
264
2
0
21 Sep 2025
Inverting Trojans in LLMs
Inverting Trojans in LLMs
Zhengxing Li
Guangmingmei Yang
Jayaram Raghuram
David J. Miller
G. Kesidis
LLMSV
129
0
0
19 Sep 2025
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia
Igor Jochem Sanz
Gabriel Antonio Fontes Rebello
Rodrigo Duarte de Meneses
Briland Hitaj
Ulf Lindqvist
337
1
0
12 Sep 2025
Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm
Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm
Yan Pang
Wenlong Meng
Xiaojing Liao
Tianhao Wang
214
3
0
08 Sep 2025
Backdoor Samples Detection Based on Perturbation Discrepancy Consistency in Pre-trained Language Models
Backdoor Samples Detection Based on Perturbation Discrepancy Consistency in Pre-trained Language ModelsNeural Networks (NN), 2025
Zuquan Peng
Jianming Fu
Lixin Zou
Li Zheng
Yanzhen Ren
Guojun Peng
AAML
179
0
0
30 Aug 2025
Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution
Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution
Chen Chen
Yuchen Sun
Jiaxin Gao
Xueluan Gong
Qian-Wei Wang
Ziyao Wang
Yongsen Zheng
K. Lam
AAMLKELM
194
1
0
28 Aug 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Md Abdullah Al Mamun
Ihsen Alouani
Nael B. Abu-Ghazaleh
118
1
0
28 Aug 2025
Pruning Strategies for Backdoor Defense in LLMs
Pruning Strategies for Backdoor Defense in LLMs
Santosh Chapagain
S. M. Hamdi
S. F. Boubrahimi
AAML
162
5
0
27 Aug 2025
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
Zihan Wang
Rui Zhang
Hongwei Li
Wenshu Fan
Wenbo Jiang
Qingchuan Zhao
Guowen Xu
257
4
0
02 Aug 2025
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
Sanhanat Sivapiromrat
Caiqi Zhang
Marco Basaldella
Nigel Collier
AAML
276
3
0
15 Jul 2025
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Biao Yi
Tiansheng Huang
Sishuo Chen
Tong Li
Zheli Liu
Zhixuan Chu
Yiming Li
AAML
371
27
0
19 Jun 2025
Your Agent Can Defend Itself against Backdoor Attacks
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang
Liang Jiacheng
Cao Bochuan
Chen Jinghui
Wang Ting
AAMLLLMAG
429
6
0
10 Jun 2025
A Systematic Review of Poisoning Attacks Against Large Language Models
A Systematic Review of Poisoning Attacks Against Large Language Models
Neil Fendley
Edward W. Staley
Joshua Carney
William Redman
Marie Chau
Nathan G. Drenkow
AAMLPILM
281
6
0
06 Jun 2025
Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models
Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Jinwen Chen
Hainan Zhang
Fei Sun
Qinnan Zhang
Sijia Wen
Ziwei Wang
Zhiming Zheng
AAML
280
0
0
29 May 2025
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Pengzhou Cheng
Haowen Hu
Zheng Wu
Zongru Wu
Tianjie Ju
Zhuosheng Zhang
Zhuosheng Zhang
LLMAGAAML
443
7
0
20 May 2025
A Survey of Attacks on Large Language Models
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAMLELM
342
11
0
18 May 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual ReasoningIEEE International Conference on Information Reuse and Integration (IRI), 2025
Falong Fan
Xi Li
LLMAGAAML
427
6
0
16 May 2025
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
Liang Luo
Hongwei Li
Rui Zhang
Wenbo Jiang
Kangjie Chen
Tianwei Zhang
Qingchuan Zhao
Guowen Xu
AAML
268
1
0
06 May 2025
A Chaos Driven Metric for Backdoor Attack Detection
A Chaos Driven Metric for Backdoor Attack Detection
Hema Karnam Surendrababu
Nithin Nagaraj
AAML
197
0
0
06 May 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
348
1
0
24 Apr 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
397
7
0
24 Apr 2025
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
387
2
0
15 Apr 2025
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model SelectionACM SIGMOBILE International Conference on Mobile Systems, Applications, and Services (MobiSys), 2025
Haoming Wang
Boyuan Yang
Xiangyu Yin
Wei Gao
484
6
0
15 Apr 2025
Exploring Backdoor Attack and Defense for LLM-empowered Recommendations
Exploring Backdoor Attack and Defense for LLM-empowered Recommendations
Liangbo Ning
Wenqi Fan
Qing Li
AAMLSILM
409
5
0
15 Apr 2025
NLP Security and Ethics, in the Wild
NLP Security and Ethics, in the WildTransactions of the Association for Computational Linguistics (TACL), 2025
Heather Lent
Erick Galinkin
Yiyi Chen
Jens Myrup Pedersen
Leon Derczynski
Johannes Bjerva
SILM
472
1
0
09 Apr 2025
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
Gejian Zhao
Hanzhou Wu
Xinpeng Zhang
Athanasios V. Vasilakos
LRM
380
13
0
08 Apr 2025
Defending Deep Neural Networks against Backdoor Attacks via Module Switching
Defending Deep Neural Networks against Backdoor Attacks via Module Switching
Weijun Li
Ansh Arora
Xuanli He
Mark Dras
Xingliang Yuan
AAMLMoMe
384
1
0
08 Apr 2025
The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning
The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning
Virilo Tejedor
Cristina Zuheros
Carlos Peláez-González
David Herrera-Poyatos
Andrés Herrera-Poyatos
F. Herrera
335
0
0
04 Apr 2025
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Shide Zhou
Kaidi Wang
Ling Shi
Han Wang
PILMHILM
301
2
0
01 Apr 2025
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang Zhai
Jiajun Li
Yue Liu
Huanran Chen
Zhihua Tian
Wenjie Qu
Qingni Shen
Ruoxi Jia
Yinpeng Dong
Jiaheng Zhang
AAML
709
0
0
09 Mar 2025
Are Your LLM-based Text-to-SQL Models Secure? Exploring SQL Injection via Backdoor Attacks
Are Your LLM-based Text-to-SQL Models Secure? Exploring SQL Injection via Backdoor Attacks
Meiyu Lin
Haichuan Zhang
Jiale Lao
Renyuan Li
Yuanchun Zhou
Carl Yang
Yang Cao
Mingjie Tang
SILM
572
5
0
07 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
BadJudge: Backdoor Vulnerabilities of LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2025
Terry Tong
Haiwei Yang
Zhe Zhao
Mengzhao Chen
AAMLELM
320
15
0
01 Mar 2025
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Chichien Tsai
Chiamu Yu
Yingdar Lin
Yusung Wu
Weibin Lee
AAML
420
1
0
27 Feb 2025
Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness
Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code NaturalnessInternational Conference on Software Engineering (ICSE), 2025
Weisong Sun
Yuchen Chen
Mengzhe Yuan
Chunrong Fang
Zhenpeng Chen
Chong Wang
Yang Liu
Baowen Xu
Zhenyu Chen
AAML
341
5
0
20 Feb 2025
Poisoned Source Code Detection in Code Models
Poisoned Source Code Detection in Code ModelsJournal of Systems and Software (JSS), 2025
Ehab Ghannoum
Mohammad Ghafari
AAML
449
0
0
19 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAMLSILM
557
12
0
18 Feb 2025
Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution
Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution
Yao Tong
Weijun Li
Xuanli He
Haolan Zhan
Xingliang Yuan
AAML
320
1
0
29 Dec 2024
1234
Next
Page 1 of 4