Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.16119
Cited By
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
24 October 2023
Sander Schulhoff
Jeremy Pinto
Anaum Khan
Louis-Franccois Bouchard
Chenglei Si
Svetlina Anati
Valen Tagliabue
Anson Liu Kost
Christopher Carnahan
Jordan L. Boyd-Graber
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition"
10 / 10 papers shown
Title
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
Zihao Wang
Yibo Jiang
Jiahao Yu
Heqing Huang
33
0
0
01 May 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
63
0
0
08 Mar 2025
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
101
1
0
20 Dec 2024
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models
Shengye Wan
Cyrus Nikolaidis
Daniel Song
David Molnar
James Crnkovich
...
Spencer Whitman
Stephanie Ding
Vlad Ionescu
Yue Li
Joshua Saxe
ELM
31
18
0
02 Aug 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
33
2
0
19 Jul 2024
Knowledge Return Oriented Prompting (KROP)
Jason Martin
Kenneth Yeung
25
0
0
11 Jun 2024
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David A. Wagner
SILM
AAML
22
65
0
09 Feb 2024
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
102
84
0
21 Aug 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
1