ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.01446
  4. Cited By
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
v1v2v3 (latest)

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Applied Sciences (Appl. Sci.), 2023
4 September 2023
Raz Lapid
Ron Langberg
Moshe Sipper
    AAML
ArXiv (abs)PDFHTML

Papers citing "Open Sesame! Universal Black Box Jailbreaking of Large Language Models"

44 / 94 papers shown
Title
Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries
Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries
Jiahao Yu
Haozheng Luo
Jerry Yao-Chieh Hu
Wenbo Guo
Han Liu
Xinyu Xing
224
21
0
31 May 2024
A Theoretical Understanding of Self-Correction through In-context
  Alignment
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
222
51
0
28 May 2024
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionInternational Conference on Machine Learning (ICML), 2024
Ziyang Zhang
Qizhen Zhang
Jakob N. Foerster
AAML
274
31
0
13 May 2024
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for
  Jailbreaking via Obfuscating Intent
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating IntentComputer/law journal (JITPL), 2024
Shang Shang
Xinqiang Zhao
Zhongjiang Yao
Yepeng Yao
Liya Su
Zijing Fan
Xiaodan Zhang
Zhengwei Jiang
241
10
0
06 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
340
132
0
06 May 2024
Talking Nonsense: Probing Large Language Models' Understanding of
  Adversarial Gibberish Inputs
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
282
8
0
26 Apr 2024
Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors
Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors
Raz Lapid
Almog Dubin
Moshe Sipper
AAML
176
5
0
18 Apr 2024
Ethical Framework for Responsible Foundational Models in Medical Imaging
Ethical Framework for Responsible Foundational Models in Medical Imaging
Abhijit Das
Debesh Jha
Jasmer Sanjotra
Onkar Susladkar
Suramyaa Sarkar
A. Rauniyar
Nikhil Tomar
Vanshali Sharma
Ulas Bagci
MedIm
176
3
0
14 Apr 2024
Exploring the True Potential: Evaluating the Black-box Optimization
  Capability of Large Language Models
Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models
Beichen Huang
Xingyu Wu
Yu Zhou
Jibin Wu
Liang Feng
Ran Cheng
Kay Chen Tan
244
22
0
09 Apr 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
624
352
0
02 Apr 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large
  Language Models
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao
Edoardo Debenedetti
Avi Schwarzschild
Maksym Andriushchenko
Francesco Croce
...
Nicolas Flammarion
George J. Pappas
F. Tramèr
Hamed Hassani
Eric Wong
ALMELMAAML
311
263
0
28 Mar 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
435
111
0
26 Mar 2024
XAI-Based Detection of Adversarial Attacks on Deepfake Detectors
XAI-Based Detection of Adversarial Attacks on Deepfake Detectors
Ben Pinhasov
Raz Lapid
Rony Ohayon
Moshe Sipper
Y. Aperstein
AAML
158
15
0
05 Mar 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
192
139
0
26 Feb 2024
Multi-Bit Distortion-Free Watermarking for Large Language Models
Multi-Bit Distortion-Free Watermarking for Large Language Models
Massieh Kordi Boroujeny
Ya Jiang
Kai Zeng
Brian L. Mark
WaLMVLM
208
8
0
26 Feb 2024
Defending Large Language Models against Jailbreak Attacks via Semantic
  Smoothing
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Jiabao Ji
Bairu Hou
Avi Schwarzschild
George J. Pappas
Hamed Hassani
Yang Zhang
Eric Wong
Shiyu Chang
AAML
185
69
0
25 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
Soheil Feizi
MIALM
278
63
0
23 Feb 2024
Coercing LLMs to do and reveal (almost) anything
Coercing LLMs to do and reveal (almost) anything
Jonas Geiping
Alex Stein
Manli Shu
Khalid Saifullah
Yuxin Wen
Tom Goldstein
AAML
182
77
0
21 Feb 2024
Is the System Message Really Important to Jailbreaks in Large Language
  Models?
Is the System Message Really Important to Jailbreaks in Large Language Models?
Xiaotian Zou
Yongkang Chen
Ke Li
164
20
0
20 Feb 2024
A StrongREJECT for Empty Jailbreaks
A StrongREJECT for Empty Jailbreaks
Alexandra Souly
Qingyuan Lu
Dillon Bowen
Tu Trinh
Elvis Hsieh
...
Pieter Abbeel
Justin Svegliato
Scott Emmons
Olivia Watkins
Sam Toyer
215
176
0
15 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
174
44
0
15 Feb 2024
Attacking Large Language Models with Projected Gradient Descent
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAMLSILM
217
92
0
14 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
390
77
0
14 Feb 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xing-ming Guo
Fangxu Yu
Huan Zhang
Lianhui Qin
Bin Hu
AAML
353
143
0
13 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min Lin
LLMAGLM&Ro
170
92
0
13 Feb 2024
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Gelei Deng
Yi Liu
Kailong Wang
Yuekang Li
Tianwei Zhang
Yang Liu
198
66
0
13 Feb 2024
Security and Privacy Challenges of Large Language Models: A Survey
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILMELM
352
287
0
30 Jan 2024
Weak-to-Strong Jailbreaking on Large Language Models
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Y. Wang
788
87
0
30 Jan 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
361
102
0
29 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI AuditsConference on Fairness, Accountability and Transparency (FAccT), 2024
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
448
124
0
25 Jan 2024
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
Wang Chao
Jiaxuan Zhao
Licheng Jiao
Lingling Li
Fang Liu
Shuyuan Yang
361
19
0
19 Jan 2024
Evolutionary Computation in the Era of Large Language Model: Survey and
  Roadmap
Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap
Xingyu Wu
Sheng-hao Wu
Jibin Wu
Liang Feng
Kay Chen Tan
ELM
445
117
0
18 Jan 2024
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
Kazuhiro Takemoto
309
31
0
18 Jan 2024
Make Them Spill the Beans! Coercive Knowledge Extraction from
  (Production) LLMs
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
245
21
0
08 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Tree of Attacks: Jailbreaking Black-Box LLMs AutomaticallyNeural Information Processing Systems (NeurIPS), 2023
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
253
412
0
04 Dec 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual PromptsAAAI Conference on Artificial Intelligence (AAAI), 2023
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
581
264
0
09 Nov 2023
Adversarial Attacks and Defenses in Large Language Models: Old and New
  Threats
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Leo Schwinn
David Dobre
Stephan Günnemann
Gauthier Gidel
AAMLELM
206
59
0
30 Oct 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large
  Language Models
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu
Ruiyi Zhang
Bang An
Gang Wu
Joe Barrow
Zichao Wang
Furong Huang
A. Nenkova
Tong Sun
SILMAAML
184
83
0
23 Oct 2023
Prompt Packer: Deceiving LLMs through Compositional Instruction with
  Hidden Attacks
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
247
31
0
16 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Catastrophic Jailbreak of Open-source LLMs via Exploiting GenerationInternational Conference on Learning Representations (ICLR), 2023
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
210
390
0
10 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
394
261
0
03 Oct 2023
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language
  Models
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Xiaogeng Liu
Nan Xu
Muhao Chen
Chaowei Xiao
SILM
269
528
0
03 Oct 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?International Conference on Learning Representations (ICLR), 2023
Canyu Chen
Kai Shu
DeLMO
685
231
0
25 Sep 2023
Patch of Invisibility: Naturalistic Physical Black-Box Adversarial
  Attacks on Object Detectors
Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors
Raz Lapid
Eylon Mizrahi
Moshe Sipper
AAML
265
3
0
07 Mar 2023
Previous
12