Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14910
Cited By
From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
24 May 2023
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Shortcuts to Triggers: Backdoor Defense with Denoised PoE"
24 / 24 papers shown
Title
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
62
1
0
26 Feb 2025
Using Interleaved Ensemble Unlearning to Keep Backdoors at Bay for Finetuning Vision Transformers
Zeyu Michael Li
AAML
18
0
0
01 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Jiashu Xu
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
31
4
0
30 Sep 2024
Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan
Wenjie Jacky Mo
Xiang Ren
Robin Jia
ELM
35
1
0
31 Aug 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense
Qi Zhou
Zipeng Ye
Yubo Tang
Wenjian Luo
Yuhui Shi
Yan Jia
AAML
22
2
0
07 Jul 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Yi Zeng
Weiyu Sun
Tran Ngoc Huynh
Dawn Song
Bo Li
Ruoxi Jia
AAML
LLMSV
32
17
0
24 Jun 2024
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
20
0
0
18 May 2024
Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge
Narek Maloyan
Ekansh Verma
Bulat Nutfullin
Bislan Ashinov
41
7
0
21 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
19
8
0
02 Apr 2024
Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning
Xiaopeng Xie
Ming Yan
Xiwen Zhou
Chenlong Zhao
Suli Wang
Yong Zhang
Joey Tianyi Zhou
AAML
25
0
0
30 Mar 2024
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Leilei Gan
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
19
22
0
19 Feb 2024
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Dongxiao Zhu
30
32
0
16 Nov 2023
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
18
15
0
12 Sep 2023
Backdoor Defense via Deconfounded Representation Learning
Zaixin Zhang
Qi Liu
Zhicai Wang
Zepu Lu
Qingyong Hu
AAML
44
39
0
13 Mar 2023
A Study of the Attention Abnormality in Trojaned BERTs
Weimin Lyu
Songzhu Zheng
Teng Ma
Chao Chen
51
53
0
13 May 2022
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer
Fanchao Qi
Yangyi Chen
Xurui Zhang
Mukai Li
Zhiyuan Liu
Maosong Sun
AAML
SILM
77
171
0
14 Oct 2021
BFClass: A Backdoor-free Text Classification Framework
Zichao Li
Dheeraj Mekala
Chengyu Dong
Jingbo Shang
SILM
56
27
0
22 Sep 2021
Competency Problems: On Finding and Removing Artifacts in Language Data
Matt Gardner
William Merrill
Jesse Dodge
Matthew E. Peters
Alexis Ross
Sameer Singh
Noah A. Smith
151
106
0
17 Apr 2021
Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise
Pengfei Chen
Junjie Ye
Guangyong Chen
Jingwei Zhao
Pheng-Ann Heng
NoLa
24
55
0
10 Dec 2020
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification
Chuanshuai Chen
Jiazhu Dai
SILM
48
126
0
11 Jul 2020
Combating noisy labels by agreement: A joint training method with co-regularization
Hongxin Wei
Lei Feng
Xiangyu Chen
Bo An
NoLa
303
488
0
05 Mar 2020
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
187
574
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1