v1v2 (latest)

AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs

11 April 2024

Zeyi Liao

Huan Sun

AAML

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github (85★)

Papers citing "AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs"

50 / 63 papers shown

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security

197

04 Dec 2025

Are LLMs Good Safety Agents or a Propaganda Engine?

110

28 Nov 2025

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

326

23 Nov 2025

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

151

20 Nov 2025

AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

277

06 Nov 2025

Black-box Optimization of LLM Outputs by Asking for Directions

211

19 Oct 2025

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

299

17 Oct 2025

A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space

Bibekananda Patra

Aditya Mahesh Kolte

Sandipan Bandyopadhyay

217

10 Oct 2025

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Aofan Liu

Lulu Tang

MLLM VLM

284

09 Oct 2025

Untargeted Jailbreak Attack

300

03 Oct 2025

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

259

25 Sep 2025

Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?

201

08 Sep 2025

SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks

119

21 Aug 2025

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

225

20 Aug 2025

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation

258

19 Aug 2025

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position

Zhixin Xie

Xurui Song

Jun Luo

257

17 Aug 2025

SceneJailEval: A Scenario-Adaptive Multi-Dimensional Framework for Jailbreak Evaluation

158

08 Aug 2025

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

266

05 Aug 2025

Activation-Guided Local Editing for Jailbreaking Attacks

336

01 Aug 2025

Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak

436

09 Jul 2025

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models

262

20 Jun 2025

Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs

Hiroshi Matsuda

Chunpeng Ma

Masayuki Asahara

390

11 Jun 2025

Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures

195

09 Jun 2025

Adversarial Preference Learning for Robust LLM AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

245

30 May 2025

One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMsInternational Conference on Learning Representations (ICLR), 2025

347

23 May 2025

Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives

319

23 May 2025

Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses

Xiaoxue Yang

Bozhidar Stevanoski

Matthieu Meeus

Yves-Alexandre de Montjoye

AAML

354

21 May 2025

Adversarial Suffix Filtering: a Defense Pipeline for LLMs

David Khachaturov

Robert D. Mullins

AAML

230

14 May 2025

Demystifying optimized prompts in language models

Rimon Melamed

Lucas H. McCabe

H. H. Huang

304

04 May 2025

LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution

Zhuoran Yang

Jie Peng

AAML

357

02 Apr 2025

PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization

624

02 Apr 2025

Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models

338

28 Mar 2025

Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models

482

08 Mar 2025

LLM-Safety Evaluations Lack Robustness

1.0K

04 Mar 2025

GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods

485

24 Feb 2025

REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective

307

24 Feb 2025

TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in PracticeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

325

21 Feb 2025

Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models

452

17 Feb 2025

StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

289

17 Feb 2025

Fast Proxies for LLM Robustness Evaluation

335

14 Feb 2025

KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs

330

05 Feb 2025

Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context

688

20 Dec 2024

Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation

323

21 Nov 2024

AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts

360

29 Oct 2024

Stealthy Jailbreak Attacks on Large Language Models via Benign Data MirroringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

...

379

28 Oct 2024

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV EvictionInternational Conference on Learning Representations (ICLR), 2024

290

25 Oct 2024

Iterative Self-Tuning LLMs for Enhanced Jailbreaking CapabilitiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

545

24 Oct 2024

On the Role of Attention Heads in Large Language Model SafetyInternational Conference on Learning Representations (ICLR), 2024

Kun Wang

Yang Liu

Cunchun Li

Yongbin Li

558

17 Oct 2024

Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation

403

15 Oct 2024

Instructional Segment Embedding: Improving LLM Safety with Instruction HierarchyInternational Conference on Learning Representations (ICLR), 2024

491

09 Oct 2024