v1v2v3 (latest)

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Applied Sciences (Appl. Sci.), 2023

4 September 2023

Papers citing "Open Sesame! Universal Black Box Jailbreaking of Large Language Models"

50 / 94 papers shown

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

279

23 Nov 2025

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

278

16 Nov 2025

Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges

Hamin Koo

Minseon Kim

Jaehyung Kim

03 Nov 2025

Diffusion LLMs are Natural Adversaries for any LLM

197

31 Oct 2025

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

230

17 Oct 2025

LLM Jailbreak Detection for (Almost) Free!

110

18 Sep 2025

Semantic Representation Attack against Aligned Large Language Models

242

18 Sep 2025

HAMSA: Hijacking Aligned Compact Models via Stealthy Automation

108

22 Aug 2025

SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks

100

21 Aug 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

140

16 Aug 2025

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

186

14 Aug 2025

GPT, But Backwards: Exactly Inverting Language Model Outputs

163

02 Jul 2025

VERA: Variational Inference Framework for Jailbreaking Large Language Models

359

27 Jun 2025

MEF: A Capability-Aware Multi-Encryption Framework for Evaluating Vulnerabilities in Black-Box Large Language Models

394

29 May 2025

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

275

21 May 2025

BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation

665

18 May 2025

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control

Hannah Cyberey

David Evans

LLMSV

520

23 Apr 2025

A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models

Carlos Peláez-González

Andrés Herrera-Poyatos

Cristina Zuheros

David Herrera-Poyatos

Virilo Tejedor

F. Herrera

AAML

250

07 Apr 2025

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG

415

07 Apr 2025

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution StrategyComputer Vision and Pattern Recognition (CVPR), 2025

370

26 Mar 2025

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

304

20 Mar 2025

AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations

Dillon Bowen

Ann-Kathrin Dombrowski

Adam Gleave

Chris Cundy

ELM

155

17 Mar 2025

JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language ModelsPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025

200

26 Feb 2025

KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs

275

05 Feb 2025

When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchNeural Information Processing Systems (NeurIPS), 2024

415

28 Jan 2025

Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity ConstraintsAAAI Conference on Artificial Intelligence (AAAI), 2025

375

14 Jan 2025

Global Challenge for Safe and Secure LLMs Track 1

...

197

21 Nov 2024

DROJ: A Prompt-Driven Attack against Large Language Models

Leyang Hu

Boran Wang

14 Nov 2024

Diversity Helps Jailbreak Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

1.1K

06 Nov 2024

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

169

11 Oct 2024

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024

Qian Liu

277

09 Oct 2024

FlipAttack: Jailbreak LLMs via Flipping

Yue Liu

Miao Xiong

Bryan Hooi

240

02 Oct 2024

AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Jizhong Han

138

11 Sep 2024

Recent Advances in Attack and Defense Approaches of Large Language Models

341

05 Sep 2024

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Bang An

Sicheng Zhu

Ruiyi Zhang

Michael-Andrei Panaitescu-Liess

Yuancheng Xu

Furong Huang

AAML

373

01 Sep 2024

On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective

473

25 Aug 2024

Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024

Jingtong Su

Mingyu Lee

SangKeun Lee

199

02 Aug 2024

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

242

23 Jul 2024

Does Refusal Training in LLMs Generalize to the Past Tense?

Maksym Andriushchenko

Nicolas Flammarion

563

16 Jul 2024

Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation

355

11 Jul 2024

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Zhen Sun

Qi Li

333

197

05 Jul 2024

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

254

01 Jul 2024

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

409

26 Jun 2024

Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary StudySwarm and Evolutionary Computation (Swarm Evol. Comput.), 2024

241

15 Jun 2024

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

482

13 Jun 2024

AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

342

06 Jun 2024

Safeguarding Large Language Models: A Survey

...

257

03 Jun 2024

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

314

03 Jun 2024

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

Frank Weizhen Liu

Chenhui Hu

AAML

218

01 Jun 2024

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Jindong Gu

Yang Liu

Simeng Qin

Min Lin

AAML

350

31 May 2024