Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.14475
Cited By
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
27 April 2023
Jiazhao Li
Yijin Yang
Zhuofeng Wu
V. Vydiswaran
Chaowei Xiao
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger"
11 / 11 papers shown
Title
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
54
0
0
24 Apr 2025
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
18
17
0
08 Apr 2024
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
22
72
0
07 Jun 2023
Defending against Insertion-based Textual Backdoor Attacks via Attribution
Jiazhao Li
Zhuofeng Wu
Wei Ping
Chaowei Xiao
V. Vydiswaran
40
23
0
03 May 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer
Fanchao Qi
Yangyi Chen
Xurui Zhang
Mukai Li
Zhiyuan Liu
Maosong Sun
AAML
SILM
75
171
0
14 Oct 2021
BFClass: A Backdoor-free Text Classification Framework
Zichao Li
Dheeraj Mekala
Chengyu Dong
Jingbo Shang
SILM
51
27
0
22 Sep 2021
Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs
Kuan-Hao Huang
Kai-Wei Chang
137
61
0
26 Jan 2021
A Theoretical Analysis of the Repetition Problem in Text Generation
Z. Fu
Wai Lam
Anthony Man-Cho So
Bei Shi
64
89
0
29 Dec 2020
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
228
909
0
21 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1