Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2309.01446
Cited By
v1
v2
v3 (latest)
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Applied Sciences (Appl. Sci.), 2023
4 September 2023
Raz Lapid
Ron Langberg
Moshe Sipper
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Open Sesame! Universal Black Box Jailbreaking of Large Language Models"
50 / 94 papers shown
Title
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
134
0
0
23 Nov 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen
Xin Wang
Juncheng Li
Yixu Wang
Jie Li
Yan Teng
Yingchun Wang
Xingjun Ma
AAML
249
0
0
16 Nov 2025
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Hamin Koo
Minseon Kim
Jaehyung Kim
60
0
0
03 Nov 2025
Diffusion LLMs are Natural Adversaries for any LLM
David Lüdke
Tom Wollschlager
Paul Ungermann
Stephan Günnemann
Leo Schwinn
DiffM
148
0
0
31 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
193
0
0
17 Oct 2025
LLM Jailbreak Detection for (Almost) Free!
Guorui Chen
Yifan Xia
Xiaojun Jia
Ruoyao Xiao
Juil Sock
Jindong Gu
70
1
0
18 Sep 2025
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
166
0
0
18 Sep 2025
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
Alexey Krylov
Iskander Vagizov
Dmitrii Korzh
Maryam Douiba
Azidine Guezzaz
Vladimir Kokh
Sergey D. Erokhin
Elena Tutubalina
Oleg Y. Rogov
52
2
0
22 Aug 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li
Xiaodong Wu
Qi Li
Jianbing Ni
Rongxing Lu
AAML
MU
KELM
84
0
0
21 Aug 2025
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Xuyang Guo
Zekai Huang
Zhao Song
Jiahao Zhang
LRM
120
3
0
16 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAML
SILM
137
0
0
14 Aug 2025
GPT, But Backwards: Exactly Inverting Language Model Outputs
Adrians Skapars
Edoardo Manino
Youcheng Sun
Lucas C. Cordeiro
122
0
0
02 Jul 2025
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab
Lu Yan
Patrick Pynadath
Xiangyu Zhang
Ruqi Zhang
AAML
VLM
265
1
0
27 Jun 2025
MEF: A Capability-Aware Multi-Encryption Framework for Evaluating Vulnerabilities in Black-Box Large Language Models
Mingyu Yu
Wei Wang
Y. X. Wei
Sujuan Qin
Fei Gao
Wenmin Li
AAML
298
0
0
29 May 2025
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
Taiye Chen
Zeming Wei
Ang Li
Yisen Wang
AAML
223
6
0
21 May 2025
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
Wenqi Lyu
Zerui Li
Yanyuan Qiao
Qi Wu
AAML
525
1
0
18 May 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David Evans
LLMSV
432
6
0
23 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
195
1
0
07 Apr 2025
Don't Lag, RAG: Training-Free Adversarial Detection Using RAG
Roie Kazoom
Raz Lapid
Moshe Sipper
Ofer Hadar
VLM
ObjD
AAML
337
5
0
07 Apr 2025
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Computer Vision and Pattern Recognition (CVPR), 2025
Joonhyun Jeong
Seyun Bae
Yeonsung Jung
Jaeryong Hwang
Eunho Yang
AAML
304
16
0
26 Mar 2025
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou
Kevin E. Wu
Francesco Pinto
Zhongfu Chen
Yi Zeng
Yu Yang
Shuang Yang
Sanmi Koyejo
James Zou
Bo Li
LLMAG
AAML
268
14
0
20 Mar 2025
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
Dillon Bowen
Ann-Kathrin Dombrowski
Adam Gleave
Chris Cundy
ELM
115
2
0
17 Mar 2025
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Shuyi Liu
Simiao Cui
Haoran Bu
Yuming Shang
Xi Zhang
ELM
168
2
0
26 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
239
2
0
05 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Neural Information Processing Systems (NeurIPS), 2024
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
344
37
0
28 Jan 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
AAAI Conference on Artificial Intelligence (AAAI), 2025
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
283
0
0
14 Jan 2025
Global Challenge for Safe and Secure LLMs Track 1
Yang Liu
Yihao Huang
Yang Liu
Peng Yan Tan
Weng Kuan Yau
...
Yan Wang
Rick Siow Mong Goh
Liangli Zhen
Yingjie Zhang
Zhe Zhao
ELM
AILaw
175
3
0
21 Nov 2024
DROJ: A Prompt-Driven Attack against Large Language Models
Leyang Hu
Boran Wang
79
1
0
14 Nov 2024
Diversity Helps Jailbreak Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
947
3
0
06 Nov 2024
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Zijun Wang
Haoqin Tu
J. Mei
Bingchen Zhao
Yanjie Wang
Cihang Xie
116
17
0
11 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
International Conference on Learning Representations (ICLR), 2024
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
221
22
0
09 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
193
38
0
02 Oct 2024
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Lijia Lv
Weigang Zhang
Xuehai Tang
Jie Wen
Feng Liu
Jizhong Han
Songlin Hu
AAML
104
5
0
11 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
299
8
0
05 Sep 2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An
Sicheng Zhu
Ruiyi Zhang
Michael-Andrei Panaitescu-Liess
Yuancheng Xu
Furong Huang
AAML
310
28
0
01 Sep 2024
On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective
Tal Alter
Raz Lapid
Moshe Sipper
AAML
377
13
0
25 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Neural Information Processing Systems (NeurIPS), 2024
Jingtong Su
Mingyu Lee
SangKeun Lee
159
21
0
02 Aug 2024
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Huiyu Xu
Wenhui Zhang
Peng Kuang
Feng Xiao
Rui Zheng
Yunhe Feng
Zhongjie Ba
Kui Ren
AAML
LLMAG
182
28
0
23 Jul 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
471
64
0
16 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
299
13
0
11 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
271
190
0
05 Jul 2024
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything
Xiaotian Zou
Ke Li
Yongkang Chen
MLLM
195
6
0
01 Jul 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
343
57
0
26 Jun 2024
Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study
Swarm and Evolutionary Computation (Swarm Evol. Comput.), 2024
Hao Hao
Xiaoqun Zhang
Aimin Zhou
ELM
202
25
0
15 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
361
21
0
13 Jun 2024
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens
Lin Lu
Hai Yan
Zenghui Yuan
Jiawen Shi
Wenqi Wei
Pin-Yu Chen
Pan Zhou
AAML
296
13
0
06 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
230
61
0
03 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
AAML
266
66
0
03 Jun 2024
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu
Chenhui Hu
AAML
185
12
0
01 Jun 2024
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Yang Liu
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Simeng Qin
Min Lin
AAML
269
66
0
31 May 2024
1
2
Next