ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17601
  4. Cited By
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs
v1v2v3v4v5 (latest)

Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs

23 May 2025
Jiawei Kong
Hao Fang
Xiaochen Yang
Kuofeng Gao
Bin Chen
Shu-Tao Xia
Yaowei Wang
Min Zhang
    AAML
ArXiv (abs)PDFHTML

Papers citing "Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs"

17 / 17 papers shown
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
Matous Kozak
Roshanak Zilouchian Moghaddam
Siva Sivaraman
LLMAGELM
208
1
0
12 Jul 2025
Injecting Universal Jailbreak Backdoors into LLMs in MinutesInternational Conference on Learning Representations (ICLR), 2025
Zhuowei Chen
Qiannan Zhang
Shichao Pei
151
8
0
09 Feb 2025
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Yihe Deng
Yu Yang
Junkai Zhang
Wei Wang
B. Li
OffRL
248
18
0
07 Feb 2025
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
371
1,167
0
18 Jun 2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Safety Alignment Should Be Made More Than Just a Few Tokens DeepInternational Conference on Learning Representations (ICLR), 2024
Xiangyu Qi
Ashwinee Panda
Kaifeng Lyu
Xiao Ma
Subhrajit Roy
Ahmad Beirami
Prateek Mittal
Peter Henderson
240
274
0
10 Jun 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
780
367
0
02 Apr 2024
BadEdit: Backdooring large language models by model editing
BadEdit: Backdooring large language models by model editing
Yanzhou Li
Tianlin Li
Kangjie Chen
Jian Zhang
Shangqing Liu
Wenhan Wang
Tianwei Zhang
Yang Liu
SyDaAAMLKELM
229
98
0
20 Mar 2024
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor AttacksIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024
Yige Li
Jiabo He
Jiabo He
Hanxun Huang
Xingjun Ma
Yu-Gang Jiang
AAML
306
3
0
27 Jan 2024
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
BadChain: Backdoor Chain-of-Thought Prompting for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Zhen Xiang
Fengqing Jiang
Zidi Xiong
Bhaskar Ramasubramanian
Radha Poovendran
Bo Li
LRMSILM
280
80
0
20 Jan 2024
Universal Jailbreak Backdoors from Poisoned Human Feedback
Universal Jailbreak Backdoors from Poisoned Human FeedbackInternational Conference on Learning Representations (ICLR), 2023
Javier Rando
Florian Tramèr
378
108
0
24 Nov 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context
  Demonstrations
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
369
403
0
10 Oct 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt InjectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
358
153
0
31 Jul 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
623
2,304
0
27 Jul 2023
BackdoorBench: A Comprehensive Benchmark of Backdoor Learning
BackdoorBench: A Comprehensive Benchmark of Backdoor LearningNeural Information Processing Systems (NeurIPS), 2022
Baoyuan Wu
Hongrui Chen
Ruotong Wang
Zihao Zhu
Shaokui Wei
Danni Yuan
Chaoxiao Shen
ELMAAML
317
187
0
25 Jun 2022
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,526
0
28 May 2020
Weight Poisoning Attacks on Pre-trained Models
Weight Poisoning Attacks on Pre-trained ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Keita Kurita
Paul Michel
Graham Neubig
AAMLSILM
298
533
0
14 Apr 2020
BadNets: Identifying Vulnerabilities in the Machine Learning Model
  Supply Chain
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Tianyu Gu
Brendan Dolan-Gavitt
S. Garg
SILM
578
2,024
0
22 Aug 2017
1