Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14950
Cited By
Adversarial Demonstration Attacks on Large Language Models
24 May 2023
Jiong Wang
Zi-yang Liu
Keun Hee Park
Zhuojun Jiang
Zhaoheng Zheng
Zhuofeng Wu
Muhao Chen
Chaowei Xiao
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Adversarial Demonstration Attacks on Large Language Models"
41 / 41 papers shown
Title
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
54
0
0
02 May 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
19
0
0
07 Apr 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
67
0
0
24 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
105
9
0
28 Jan 2025
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELM
AAML
67
0
0
27 Nov 2024
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks
Daniel Ayzenshteyn
Roy Weiss
Yisroel Mirsky
AAML
16
0
0
20 Oct 2024
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
Lijia Lv
Weigang Zhang
Xuehai Tang
Jie Wen
Feng Liu
Jizhong Han
Songlin Hu
AAML
24
2
0
11 Sep 2024
MILE: A Mutation Testing Framework of In-Context Learning Systems
Zeming Wei
Yihao Zhang
Meng Sun
35
0
0
07 Sep 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
30
7
0
02 Aug 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
38
12
0
30 Jul 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
29
7
0
30 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
34
77
0
05 Jul 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
36
26
0
26 Jun 2024
Security of AI Agents
Yifeng He
Ethan Wang
Yuyang Rong
Zifei Cheng
Hao Chen
LLMAG
29
7
0
12 Jun 2024
Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Sheng-Lun Wei
Cheng-Kuang Wu
Hen-Hsen Huang
Hsin-Hsi Chen
21
10
0
05 Jun 2024
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu
Chenhui Hu
AAML
25
7
0
01 Jun 2024
Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models
Simon Chi Lok Yu
Jie He
Pasquale Minervini
Jeff Z. Pan
21
0
0
24 May 2024
Exploring the Robustness of In-Context Learning with Noisy Labels
Chen Cheng
Xinzhi Yu
Haodong Wen
Jinsong Sun
Guanzhang Yue
Yihao Zhang
Zeming Wei
NoLa
19
6
0
28 Apr 2024
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan
Zidi Xiong
Yi Zeng
Ning Yu
Ruoxi Jia
D. Song
Bo-wen Li
AAML
KELM
34
38
0
19 Mar 2024
Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks
Yixin Cheng
Markos Georgopoulos
V. Cevher
Grigorios G. Chrysos
AAML
16
15
0
14 Feb 2024
Machine Unlearning in Large Language Models
Kongyang Chen
Zixin Wang
Bing Mi
Waixi Liu
Shaowei Wang
Xiaojun Ren
Jiaxing Shen
MU
16
10
0
03 Feb 2024
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
17
98
0
30 Jan 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
15
179
0
12 Jan 2024
DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Fangzhou Wu
Xiaogeng Liu
Chaowei Xiao
AAML
SILM
10
26
0
07 Dec 2023
Hijacking Context in Large Multi-modal Models
Joonhyun Jeong
MLLM
36
7
0
07 Dec 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
147
139
0
16 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
39
0
16 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
40
233
0
10 Oct 2023
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Alexander Robey
Eric Wong
Hamed Hassani
George J. Pappas
AAML
38
215
0
05 Oct 2023
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Leonardo Ranaldi
Fabio Massimo Zanzotto
15
1
0
21 Sep 2023
MathAttack: Attacking Large Language Models Towards Math Solving Ability
Zihao Zhou
Qiufeng Wang
Mingyu Jin
Jie Yao
Jianan Ye
Wei Liu
Wei Wang
Xiaowei Huang
Kaizhu Huang
AAML
14
22
0
04 Sep 2023
A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
Haomiao Yang
Kunlan Xiang
Mengyu Ge
Hongwei Li
Rongxing Lu
Shui Yu
SILM
21
42
0
28 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
31
75
0
24 Aug 2023
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Pouya Pezeshkpour
Estevam R. Hruschka
LRM
6
123
0
22 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models
Jose Berengueres
Marybeth Sandell
17
0
0
06 Jun 2023
On the Relation between Sensitivity and Accuracy in In-context Learning
Yanda Chen
Chen Zhao
Zhou Yu
Kathleen McKeown
He He
180
77
0
16 Sep 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
1,114
0
18 Apr 2021
What Makes Good In-Context Examples for GPT-
3
3
3
?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,296
0
17 Jan 2021
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
167
289
0
03 Sep 2019
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
233
909
0
21 Apr 2018
1