v1v2v3 (latest)

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

20 November 2020

Maosong Sun

ArXiv (abs)PDF HTML Github (33★)

Papers citing "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"

50 / 193 papers shown

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

224

18 Nov 2025

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

393

01 Nov 2025

Signature in Code Backdoor Detection, how far are we?

113

15 Oct 2025

Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models

109

11 Oct 2025

Automatic Text Box Placement for Supporting Typographic Design

175

09 Oct 2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

...

322

08 Oct 2025

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

241

06 Oct 2025

Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models

243

05 Oct 2025

Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

272

04 Oct 2025

A Single Character can Make or Break Your LLM Evals

155

02 Oct 2025

Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours

Rui Melo

Rui Abreu

C. Păsăreanu

180

01 Oct 2025

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

Gejian Zhao

Hanzhou Wu

Xinpeng Zhang

211

23 Sep 2025

Localizing Malicious Outputs from CodeLLM

Mayukh Borana

Junyi Liang

Sai Sathiesh Rajan

Sudipta Chattopadhyay

AAML

144

21 Sep 2025

Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving

265

21 Sep 2025

Inverting Trojans in LLMs

129

19 Sep 2025

LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems

Vitor Hugo Galhardo Moia

Igor Jochem Sanz

Gabriel Antonio Fontes Rebello

Rodrigo Duarte de Meneses

Briland Hitaj

Ulf Lindqvist

338

12 Sep 2025

Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

219

08 Sep 2025

Backdoor Samples Detection Based on Perturbation Discrepancy Consistency in Pre-trained Language ModelsNeural Networks (NN), 2025

179

30 Aug 2025

Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution

194

28 Aug 2025

Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs

Md Abdullah Al Mamun

Ihsen Alouani

Nael B. Abu-Ghazaleh

128

28 Aug 2025

Pruning Strategies for Backdoor Defense in LLMs

164

27 Aug 2025

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

265

02 Aug 2025

Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs

Sanhanat Sivapiromrat

288

15 Jul 2025

Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025

375

19 Jun 2025

Your Agent Can Defend Itself against Backdoor Attacks

435

10 Jun 2025

A Systematic Review of Poisoning Attacks Against Large Language Models

285

06 Jun 2025

Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

285

29 May 2025

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

446

20 May 2025

A Survey of Attacks on Large Language Models

Wenrui Xu

Keshab K. Parhi

AAML ELM

343

18 May 2025

PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual ReasoningIEEE International Conference on Information Reuse and Integration (IRI), 2025

Falong Fan

Xi Li

LLMAG AAML

427

16 May 2025

BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models

275

06 May 2025

A Chaos Driven Metric for Backdoor Attack Detection

Hema Karnam Surendrababu

Nithin Nagaraj

AAML

207

06 May 2025

The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes

Wencong You

Daniel Lowd

351

24 Apr 2025

BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts

402

24 Apr 2025

Propaganda AI: An Analysis of Semantic Divergence in Large Language Models

388

15 Apr 2025

Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model SelectionACM SIGMOBILE International Conference on Mobile Systems, Applications, and Services (MobiSys), 2025

487

15 Apr 2025

Exploring Backdoor Attack and Defense for LLM-empowered Recommendations

409

15 Apr 2025

NLP Security and Ethics, in the WildTransactions of the Association for Computational Linguistics (TACL), 2025

474

09 Apr 2025

Defending against Backdoor Attacks via Module Switching

393

08 Apr 2025

ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Gejian Zhao

Hanzhou Wu

Xinpeng Zhang

Athanasios V. Vasilakos

LRM

387

08 Apr 2025

The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning

Virilo Tejedor

Cristina Zuheros

Carlos Peláez-González

David Herrera-Poyatos

Andrés Herrera-Poyatos

F. Herrera

335

04 Apr 2025

Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics

302

01 Apr 2025

Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation

718

09 Mar 2025

Are Your LLM-based Text-to-SQL Models Secure? Exploring SQL Injection via Backdoor Attacks

577

07 Mar 2025

BadJudge: Backdoor Vulnerabilities of LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2025

320

01 Mar 2025

Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets

420

27 Feb 2025

Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code NaturalnessInternational Conference on Software Engineering (ICSE), 2025

341

20 Feb 2025

Poisoned Source Code Detection in Code ModelsJournal of Systems and Software (JSS), 2025

Ehab Ghannoum

Mohammad Ghafari

AAML

461

19 Feb 2025

UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models

562

18 Feb 2025

Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution

326

29 Dec 2024