ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.05224
  4. Cited By
Spinning Language Models: Risks of Propaganda-As-A-Service and
  Countermeasures

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures

9 December 2021
Eugene Bagdasaryan
Vitaly Shmatikov
    SILM
    AAML
ArXivPDFHTML

Papers citing "Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures"

50 / 52 papers shown
Title
Robo-Troj: Attacking LLM-based Task Planners
Robo-Troj: Attacking LLM-based Task Planners
Mohaiminul Al Nahian
Zainab Altaweel
David Reitano
Sabbir Ahmed
Saumitra Lohokare
Shiqi Zhang
Adnan Siraj Rakin
AAML
63
0
0
23 Apr 2025
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
23
0
0
15 Apr 2025
CROW: Eliminating Backdoors from Large Language Models via Internal
  Consistency Regularization
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
64
3
0
18 Nov 2024
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical
  Synthesis
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
Aidan Wong
He Cao
Zijing Liu
Yu Li
33
2
0
21 Oct 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Xi Chen
Yuwen Pu
Xuhong Zhang
Tianyu Du
Shouling Ji
41
2
0
02 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via
  User Inputs
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
37
4
0
01 Sep 2024
On Large Language Models in National Security Applications
On Large Language Models in National Security Applications
William N. Caballero
Phillip R. Jenkins
ELM
29
6
0
03 Jul 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in
  Instruction-tuned Language Models
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Yi Zeng
Weiyu Sun
Tran Ngoc Huynh
Dawn Song
Bo Li
Ruoxi Jia
AAML
LLMSV
35
17
0
24 Jun 2024
BadActs: A Universal Backdoor Defense in the Activation Space
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
33
5
0
18 May 2024
Immunization against harmful fine-tuning attacks
Immunization against harmful fine-tuning attacks
Domenic Rosati
Jan Wehner
Kai Williams
Lukasz Bartoszcze
Jan Batzner
Hassan Sajjad
Frank Rudzicz
AAML
57
16
0
26 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
52
55
0
14 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
33
65
0
08 Feb 2024
Manipulating Trajectory Prediction with Backdoors
Manipulating Trajectory Prediction with Backdoors
Kaouther Messaoud
Kathrin Grosse
Mickaël Chen
Matthieu Cord
Patrick Pérez
Alexandre Alahi
AAML
LLMSV
22
0
0
21 Dec 2023
Translating Legalese: Enhancing Public Understanding of Court Opinions
  with Legal Summarizers
Translating Legalese: Enhancing Public Understanding of Court Opinions with Legal Summarizers
Elliott Ash
Aniket Kesari
Suresh Naidu
Lena Song
Dominik Stammbach
ELM
11
4
0
11 Nov 2023
Label Poisoning is All You Need
Label Poisoning is All You Need
Rishi Jha
J. Hayase
Sewoong Oh
AAML
22
28
0
29 Oct 2023
Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image
  Generative Models
Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
Shawn Shan
Wenxin Ding
Josephine Passananti
Stanley Wu
Haitao Zheng
Ben Y. Zhao
SILM
DiffM
29
44
0
20 Oct 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILM
LLMAG
16
58
0
19 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
40
0
16 Oct 2023
Defending Our Privacy With Backdoors
Defending Our Privacy With Backdoors
Dominik Hintersdorf
Lukas Struppek
Daniel Neider
Kristian Kersting
SILM
AAML
18
2
0
12 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
113
300
0
19 Sep 2023
Robust Backdoor Attacks on Object Detection in Real World
Robust Backdoor Attacks on Object Detection in Real World
Yaguan Qian
Boyuan Ji
Shuke He
Shenhui Huang
Xiang Ling
Bin Wang
Wen Wang
39
3
0
16 Sep 2023
Backdoor Attacks and Countermeasures in Natural Language Processing
  Models: A Comprehensive Security Review
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
18
17
0
12 Sep 2023
MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
Arezoo Rajabi
Surudhi Asokraj
Feng-Shr Jiang
Luyao Niu
Bhaskar Ramasubramanian
J. Ritcey
Radha Poovendran
AAML
21
1
0
30 Aug 2023
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
Chengkun Wei
Wenlong Meng
Zhikun Zhang
M. Chen
Ming-Hui Zhao
Wenjing Fang
Lei Wang
Zihui Zhang
Wenzhi Chen
AAML
13
8
0
26 Aug 2023
A Cost Analysis of Generative Language Models and Influence Operations
A Cost Analysis of Generative Language Models and Influence Operations
Micah Musser
24
19
0
07 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak
  Prompts on Large Language Models
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Z. Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
33
243
0
07 Aug 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
15
75
0
31 Jul 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model
  Chatbots
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
33
118
0
16 Jul 2023
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language
  Models
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
Aidan O'Gara
6
36
0
05 Jul 2023
Prompt Injection attack against LLM-integrated Applications
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Tianwei Zhang
Yepang Liu
Haoyu Wang
Yanhong Zheng
Yang Liu
SILM
15
312
0
08 Jun 2023
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
Kai Mei
Zheng Li
Zhenting Wang
Yang Zhang
Shiqing Ma
AAML
SILM
19
48
0
28 May 2023
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting
  Jailbreaks
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao
S. Vashistha
Atharva Naik
Somak Aditya
Monojit Choudhury
25
17
0
24 May 2023
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized
  Language Models
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Shangbin Feng
Weijia Shi
Yuyang Bai
Vidhisha Balachandran
Tianxing He
Yulia Tsvetkov
KELM
45
28
0
17 May 2023
From Pretraining Data to Language Models to Downstream Tasks: Tracking
  the Trails of Political Biases Leading to Unfair NLP Models
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Shangbin Feng
Chan Young Park
Yuhan Liu
Yulia Tsvetkov
19
227
0
15 May 2023
Dual Use Concerns of Generative AI and Large Language Models
Dual Use Concerns of Generative AI and Large Language Models
A. Grinbaum
Laurynas Adomaitis
MedIm
AI4CE
21
14
0
13 May 2023
Two-in-One: A Model Hijacking Attack Against Text Generation Models
Two-in-One: A Model Hijacking Attack Against Text Generation Models
Waiman Si
Michael Backes
Yang Zhang
A. Salem
SILM
11
22
0
12 May 2023
Entity-Based Evaluation of Political Bias in Automatic Summarization
Entity-Based Evaluation of Political Bias in Automatic Summarization
Karen Zhou
Chenhao Tan
31
1
0
03 May 2023
In ChatGPT We Trust? Measuring and Characterizing the Reliability of
  ChatGPT
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Xinyue Shen
Z. Chen
Michael Backes
Yang Zhang
19
55
0
18 Apr 2023
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to
  Fine-Tune and Hard to Detect with other LLMs
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs
Da Silva Gameiro Henrique
Andrei Kucharavy
R. Guerraoui
DeLMO
6
7
0
18 Apr 2023
UNICORN: A Unified Backdoor Trigger Inversion Framework
UNICORN: A Unified Backdoor Trigger Inversion Framework
Zhenting Wang
Kai Mei
Juan Zhai
Shiqing Ma
LLMSV
14
43
0
05 Apr 2023
Does Human Collaboration Enhance the Accuracy of Identifying
  LLM-Generated Deepfake Texts?
Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?
Adaku Uchendu
Jooyoung Lee
Hua Shen
Thai Le
Ting-Hao 'Kenneth' Huang
Dongwon Lee
DeLMO
36
31
0
03 Apr 2023
Verifying the Robustness of Automatic Credibility Assessment
Verifying the Robustness of Automatic Credibility Assessment
Piotr Przybyła
A. Shvets
Horacio Saggion
DeLMO
AAML
20
6
0
14 Mar 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
41
430
0
23 Feb 2023
Mithridates: Auditing and Boosting Backdoor Resistance of Machine
  Learning Pipelines
Mithridates: Auditing and Boosting Backdoor Resistance of Machine Learning Pipelines
Eugene Bagdasaryan
Vitaly Shmatikov
AAML
24
2
0
09 Feb 2023
Training-free Lexical Backdoor Attacks on Language Models
Training-free Lexical Backdoor Attacks on Language Models
Yujin Huang
Terry Yue Zhuo
Qiongkai Xu
Han Hu
Xingliang Yuan
Chunyang Chen
SILM
20
42
0
08 Feb 2023
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for
  Text Generation and Modular Control
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Xiaochuang Han
Sachin Kumar
Yulia Tsvetkov
22
79
0
31 Oct 2022
Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against
  Fact-Verification Systems
Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems
Sahar Abdelnabi
Mario Fritz
AAML
192
5
0
07 Sep 2022
Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers
Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers
Limin Yang
Zhi Chen
Jacopo Cortellazzi
Feargus Pendlebury
Kevin Tu
Fabio Pierazzi
Lorenzo Cavallaro
Gang Wang
AAML
11
35
0
11 Feb 2022
Concealed Data Poisoning Attacks on NLP Models
Concealed Data Poisoning Attacks on NLP Models
Eric Wallace
Tony Zhao
Shi Feng
Sameer Singh
SILM
11
18
0
23 Oct 2020
It's Morphin' Time! Combating Linguistic Discrimination with
  Inflectional Perturbations
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations
Samson Tan
Shafiq R. Joty
Min-Yen Kan
R. Socher
158
103
0
09 May 2020
12
Next