Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2404.07921
Cited By
v1
v2 (latest)
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
11 April 2024
Zeyi Liao
Huan Sun
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github (85★)
Papers citing
"AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs"
50 / 63 papers shown
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
Wei Zhao
Zhe Li
Jun Sun
AAML
197
0
0
04 Dec 2025
Are LLMs Good Safety Agents or a Propaganda Engine?
Neemesh Yadav
Francesco Ortu
Jiarui Liu
Joeun Yook
Bernhard Schölkopf
Rada Mihalcea
Alberto Cazzaniga
Zhijing Jin
110
0
0
28 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
326
0
0
23 Nov 2025
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao
Zhe Li
Yige Li
Jun Sun
AAML
151
1
0
20 Nov 2025
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer
Jonas Dornbusch
Jakob Steimle
Moritz Ladenburger
Leo Schwinn
Stephan Günnemann
AAML
277
2
0
06 Nov 2025
Black-box Optimization of LLM Outputs by Asking for Directions
Jie Zhang
Meng Ding
Yang Liu
Jue Hong
F. Tramèr
AAML
211
2
0
19 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
299
2
0
17 Oct 2025
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
Bibekananda Patra
Aditya Mahesh Kolte
Sandipan Bandyopadhyay
217
13
0
10 Oct 2025
VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
Aofan Liu
Lulu Tang
MLLM
VLM
284
0
0
09 Oct 2025
Untargeted Jailbreak Attack
Xinzhe Huang
Wenjing Hu
Tianhang Zheng
Kedong Xiu
Xiaojun Jia
Haiyan Zhao
Zhan Qin
Kui Ren
AAML
300
2
0
03 Oct 2025
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
Runqi Lin
Alasdair Paren
Suqin Yuan
Muyang Li
Juil Sock
Adel Bibi
Tongliang Liu
AAML
259
6
0
25 Sep 2025
Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
Junjie Mu
Zonghao Ying
Zhekui Fan
Zonglei Jing
Yaoyuan Zhang
Zhengmin Yu
Wenxin Zhang
Quanchen Zou
Xiangzheng Zhang
AAML
201
5
0
08 Sep 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li
Xiaodong Wu
Qi Li
Jianbing Ni
Rongxing Lu
AAML
MU
KELM
119
1
0
21 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
225
2
0
20 Aug 2025
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Dongyoon Hahm
Taywon Min
Woogyeol Jin
Kimin Lee
SILM
258
7
0
19 Aug 2025
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie
Xurui Song
Jun Luo
257
6
0
17 Aug 2025
SceneJailEval: A Scenario-Adaptive Multi-Dimensional Framework for Jailbreak Evaluation
Lai Jiang
Yuekang Li
Xiaohan Zhang
Youtao Ding
Li Pan
158
0
0
08 Aug 2025
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
Bodam Kim
Hiskias Dingeto
Taeyoun Kwon
Dasol Choi
DongGeon Lee
Haon Park
JaeHoon Lee
Jongho Shin
AAML
266
2
0
05 Aug 2025
Activation-Guided Local Editing for Jailbreaking Attacks
Jiecong Wang
Haoran Li
Hao Peng
Ziqian Zeng
Zihao Wang
Haohua Du
Zhengtao Yu
AAML
336
0
0
01 Aug 2025
Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
Zixuan Huang
Kecheng Huang
Lihao Yin
Bowei He
Huiling Zhen
Mingxuan Yuan
Zili Shao
AAML
436
0
0
09 Jul 2025
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Lei Jiang
Zixun Zhang
Zizhou Wang
Xiaobing Sun
Zhen Li
Liangli Zhen
Xiaohua Xu
AAML
262
3
0
20 Jun 2025
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
Chunpeng Ma
Masayuki Asahara
390
6
0
11 Jun 2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou
Sibei Yang
Wenjie Wang
AAML
195
2
0
09 Jun 2025
Adversarial Preference Learning for Robust LLM Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuanfu Wang
Pengyu Wang
Chunyu Li
Bo Tang
Junyi Zhu
...
Keming Mao
Zhiyu Li
Feiyu Xiong
Jie Hu
Junchi Yan
AAML
245
3
0
30 May 2025
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
International Conference on Learning Representations (ICLR), 2025
Linbao Li
Y. Liu
Daojing He
Yu Li
AAML
347
6
0
23 May 2025
Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives
Wenhan Chang
Tianqing Zhu
Yu Zhao
Shuangyong Song
Ping Xiong
Wanlei Zhou
319
2
0
23 May 2025
Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses
Xiaoxue Yang
Bozhidar Stevanoski
Matthieu Meeus
Yves-Alexandre de Montjoye
AAML
354
1
0
21 May 2025
Adversarial Suffix Filtering: a Defense Pipeline for LLMs
David Khachaturov
Robert D. Mullins
AAML
230
4
0
14 May 2025
Demystifying optimized prompts in language models
Rimon Melamed
Lucas H. McCabe
H. H. Huang
304
0
0
04 May 2025
LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution
Zhuoran Yang
Jie Peng
AAML
357
2
0
02 Apr 2025
PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization
Aofan Liu
Lulu Tang
Ting Pan
Yuguo Yin
Bin Wang
Ao Yang
MLLM
AAML
624
7
0
02 Apr 2025
Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models
Runpeng Dai
Run Yang
Fan Zhou
Hongtu Zhu
338
0
0
28 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
482
4
0
08 Mar 2025
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
1.0K
13
0
04 Mar 2025
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Ruixuan Huang
Xunguang Wang
Zongjie Li
Daoyuan Wu
Shuai Wang
ALM
ELM
485
0
0
24 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
307
12
0
24 Feb 2025
TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Aman Goel
Xian Carrie Wu
Zhe Wang
Dmitriy Bespalov
Yanjun Qi
325
3
0
21 Feb 2025
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
Yue Xu
Chengyan Fu
Li Xiong
Sibei Yang
Wenjie Wang
452
1
0
17 Feb 2025
StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models
Shehel Yoosuf
Temoor Ali
Ahmed Lekssays
Mashael Alsabah
Issa M. Khalil
289
1
0
17 Feb 2025
Fast Proxies for LLM Robustness Evaluation
Tim Beyer
Jan Schuchardt
Leo Schwinn
Stephan Günnemann
AAML
335
3
0
14 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
330
4
0
05 Feb 2025
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Aman Chadha
Manas Gaur
AAML
688
4
0
20 Dec 2024
Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation
Ke Zhao
Huayang Huang
Miao Li
Yu Wu
AAML
323
2
0
21 Nov 2024
AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
Vishal Kumar
Zeyi Liao
Jaylen Jones
Huan Sun
AAML
360
8
0
29 Oct 2024
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Honglin Mu
Han He
Yuxin Zhou
Yunlong Feng
Yang Xu
...
Zeming Liu
Xudong Han
Qi Shi
Qingfu Zhu
Wanxiang Che
AAML
379
10
0
28 Oct 2024
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
International Conference on Learning Representations (ICLR), 2024
Tanqiu Jiang
Zian Wang
Jiacheng Liang
Changjiang Li
Yuhui Wang
Ting Wang
AAML
290
25
0
25 Oct 2024
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Chung-En Sun
Xiaodong Liu
Weiwei Yang
Tsui-Wei Weng
Hao Cheng
Aidan San
Michel Galley
J. Gao
545
12
0
24 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
International Conference on Learning Representations (ICLR), 2024
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Cunchun Li
Yongbin Li
558
53
0
17 Oct 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
403
4
0
15 Oct 2024
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
International Conference on Learning Representations (ICLR), 2024
Tong Wu
Shujian Zhang
Kaiqiang Song
Silei Xu
Sanqiang Zhao
Ravi Agrawal
Sathish Indurthi
Chong Xiang
Prateek Mittal
Wenxuan Zhou
491
45
0
09 Oct 2024
1
2
Next
Page 1 of 2