ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.01446
  4. Cited By
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
v1v2v3 (latest)

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Applied Sciences (Appl. Sci.), 2023
4 September 2023
Raz Lapid
Ron Langberg
Moshe Sipper
    AAML
ArXiv (abs)PDFHTML

Papers citing "Open Sesame! Universal Black Box Jailbreaking of Large Language Models"

50 / 94 papers shown
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
279
0
0
23 Nov 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen
Xin Wang
Juncheng Li
Yixu Wang
Jie Li
Yan Teng
Yingchun Wang
Xingjun Ma
AAML
278
0
0
16 Nov 2025
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Hamin Koo
Minseon Kim
Jaehyung Kim
84
1
0
03 Nov 2025
Diffusion LLMs are Natural Adversaries for any LLM
Diffusion LLMs are Natural Adversaries for any LLM
David Lüdke
Tom Wollschlager
Paul Ungermann
Stephan Günnemann
Leo Schwinn
DiffM
197
0
0
31 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
230
0
0
17 Oct 2025
LLM Jailbreak Detection for (Almost) Free!
LLM Jailbreak Detection for (Almost) Free!
Guorui Chen
Yifan Xia
Xiaojun Jia
Ruoyao Xiao
Juil Sock
Jindong Gu
110
3
0
18 Sep 2025
Semantic Representation Attack against Aligned Large Language Models
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
242
1
0
18 Sep 2025
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
Alexey Krylov
Iskander Vagizov
Dmitrii Korzh
Maryam Douiba
Azidine Guezzaz
Vladimir Kokh
Sergey D. Erokhin
Elena Tutubalina
Oleg Y. Rogov
108
2
0
22 Aug 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li
Xiaodong Wu
Qi Li
Jianbing Ni
Rongxing Lu
AAMLMUKELM
100
0
0
21 Aug 2025
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Xuyang Guo
Zekai Huang
Zhao Song
Jiahao Zhang
LRM
140
3
0
16 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAMLSILM
186
0
0
14 Aug 2025
GPT, But Backwards: Exactly Inverting Language Model Outputs
GPT, But Backwards: Exactly Inverting Language Model Outputs
Adrians Skapars
Edoardo Manino
Youcheng Sun
Lucas C. Cordeiro
163
0
0
02 Jul 2025
VERA: Variational Inference Framework for Jailbreaking Large Language Models
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab
Lu Yan
Patrick Pynadath
Xiangyu Zhang
Ruqi Zhang
AAMLVLM
359
1
0
27 Jun 2025
MEF: A Capability-Aware Multi-Encryption Framework for Evaluating Vulnerabilities in Black-Box Large Language Models
MEF: A Capability-Aware Multi-Encryption Framework for Evaluating Vulnerabilities in Black-Box Large Language Models
Mingyu Yu
Wei Wang
Y. X. Wei
Sujuan Qin
Fei Gao
Wenmin Li
AAML
394
0
0
29 May 2025
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
Taiye Chen
Zeming Wei
Ang Li
Yisen Wang
AAML
275
7
0
21 May 2025
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
Wenqi Lyu
Zerui Li
Yanyuan Qiao
Qi Wu
AAML
665
1
0
18 May 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David Evans
LLMSV
520
7
0
23 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
250
1
0
07 Apr 2025
Don't Lag, RAG: Training-Free Adversarial Detection Using RAG
Don't Lag, RAG: Training-Free Adversarial Detection Using RAG
Roie Kazoom
Raz Lapid
Moshe Sipper
Ofer Hadar
VLMObjDAAML
415
5
0
07 Apr 2025
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution StrategyComputer Vision and Pattern Recognition (CVPR), 2025
Joonhyun Jeong
Seyun Bae
Yeonsung Jung
Jaeryong Hwang
Eunho Yang
AAML
370
17
0
26 Mar 2025
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou
Kevin E. Wu
Francesco Pinto
Zhongfu Chen
Yi Zeng
Yu Yang
Shuang Yang
Sanmi Koyejo
James Zou
Bo Li
LLMAGAAML
304
15
0
20 Mar 2025
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
Dillon Bowen
Ann-Kathrin Dombrowski
Adam Gleave
Chris Cundy
ELM
155
2
0
17 Mar 2025
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language ModelsPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Shuyi Liu
Simiao Cui
Haoran Bu
Yuming Shang
Xi Zhang
ELM
200
2
0
26 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
275
2
0
05 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchNeural Information Processing Systems (NeurIPS), 2024
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
415
39
0
28 Jan 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity ConstraintsAAAI Conference on Artificial Intelligence (AAAI), 2025
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
375
0
0
14 Jan 2025
Global Challenge for Safe and Secure LLMs Track 1
Global Challenge for Safe and Secure LLMs Track 1
Yang Liu
Yihao Huang
Yang Liu
Peng Yan Tan
Weng Kuan Yau
...
Yan Wang
Rick Siow Mong Goh
Liangli Zhen
Yingjie Zhang
Zhe Zhao
ELMAILaw
197
3
0
21 Nov 2024
DROJ: A Prompt-Driven Attack against Large Language Models
DROJ: A Prompt-Driven Attack against Large Language Models
Leyang Hu
Boran Wang
88
1
0
14 Nov 2024
Diversity Helps Jailbreak Large Language Models
Diversity Helps Jailbreak Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
1.1K
4
0
06 Nov 2024
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention
  Manipulation
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Zijun Wang
Haoqin Tu
J. Mei
Bingchen Zhao
Yanjie Wang
Cihang Xie
169
19
0
11 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
277
22
0
09 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
240
41
0
02 Oct 2024
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting
  LLMs
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Lijia Lv
Weigang Zhang
Xuehai Tang
Jie Wen
Feng Liu
Jizhong Han
Songlin Hu
AAML
138
6
0
11 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILMAAML
341
8
0
05 Sep 2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An
Sicheng Zhu
Ruiyi Zhang
Michael-Andrei Panaitescu-Liess
Yuancheng Xu
Furong Huang
AAML
373
29
0
01 Sep 2024
On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective
On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective
Tal Alter
Raz Lapid
Moshe Sipper
AAML
473
14
0
25 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024
Jingtong Su
Mingyu Lee
SangKeun Lee
199
22
0
02 Aug 2024
RedAgent: Red Teaming Large Language Models with Context-aware
  Autonomous Language Agent
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Huiyu Xu
Wenhui Zhang
Peng Kuang
Feng Xiao
Rui Zheng
Yunhe Feng
Zhongjie Ba
Kui Ren
AAMLLLMAG
242
28
0
23 Jul 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
563
65
0
16 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
355
14
0
11 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
333
197
0
05 Jul 2024
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything
Xiaotian Zou
Ke Li
Yongkang Chen
MLLM
254
6
0
01 Jul 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
409
60
0
26 Jun 2024
Large Language Models as Surrogate Models in Evolutionary Algorithms: A
  Preliminary Study
Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary StudySwarm and Evolutionary Computation (Swarm Evol. Comput.), 2024
Hao Hao
Xiaoqun Zhang
Aimin Zhou
ELM
241
28
0
15 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
482
23
0
13 Jun 2024
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a
  Dependency Lens
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens
Lin Lu
Hai Yan
Zenghui Yuan
Jiawen Shi
Wenqi Wei
Pin-Yu Chen
Pan Zhou
AAML
342
13
0
06 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRLKELMAILaw
257
68
0
03 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models
  and Their Defenses
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
AAML
314
70
0
03 Jun 2024
Exploring Vulnerabilities and Protections in Large Language Models: A
  Survey
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu
Chenhui Hu
AAML
218
12
0
01 Jun 2024
Improved Techniques for Optimization-Based Jailbreaking on Large
  Language Models
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Yang Liu
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Simeng Qin
Min Lin
AAML
350
73
0
31 May 2024
12
Next