Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of
LLMs through a Global Scale Prompt Hacking Competition

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

24 October 2023

Sander Schulhoff

Louis-Franccois Bouchard

Valen Tagliabue

Christopher Carnahan

Jordan L. Boyd-Graber

Papers citing "Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition"

10 / 10 papers shown

Title
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) Zihao Wang Yibo Jiang Jiahao Yu Heqing Huang 33 0 0 01 May 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models Thomas Winninger Boussad Addad Katarzyna Kapusta AAML 63 0 0 08 Mar 2025
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context Nilanjana Das Edward Raff Manas Gaur AAML 101 1 0 20 Dec 2024
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models Shengye Wan Cyrus Nikolaidis Daniel Song David Molnar James Crnkovich ... Spencer Whitman Stephanie Ding Vlad Ionescu Yue Li Joshua Saxe ELM 31 18 0 02 Aug 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context Nilanjana Das Edward Raff Manas Gaur AAML 33 2 0 19 Jul 2024
Knowledge Return Oriented Prompting (KROP) Jason Martin Kenneth Yeung 25 0 0 11 Jun 2024
StruQ: Defending Against Prompt Injection with Structured Queries Sizhe Chen Julien Piet Chawin Sitawarin David A. Wagner SILM AAML 22 65 0 09 Feb 2024
On the Adversarial Robustness of Multi-Modal Foundation Models Christian Schlarmann Matthias Hein AAML 102 84 0 21 Aug 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,730 0 04 Mar 2022
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 267 1,798 0 14 Dec 2020