Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.00027
Cited By
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
15 November 2023
Yuanpu Cao
Bochuan Cao
Jinghui Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections"
5 / 5 papers shown
Title
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
Z. Wang
Hongwei Li
Rui Zhang
Wenbo Jiang
Kangjie Chen
Tianwei Zhang
Qingchuan Zhao
Guowen Xu
AAML
39
0
0
06 May 2025
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
Y. Gong
Zhuo Chen
Miaokun Chen
Fengchang Yu
Wei-Tsung Lu
XiaoFeng Wang
Xiaozhong Liu
J. Liu
AAML
SILM
56
0
0
03 Feb 2025
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
49
14
0
22 Oct 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
98
225
0
15 Apr 2021
1