Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12329
Cited By
Query-Based Adversarial Prompt Generation
19 February 2024
Jonathan Hayase
Ema Borevkovic
Nicholas Carlini
Florian Tramèr
Milad Nasr
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Query-Based Adversarial Prompt Generation"
6 / 6 papers shown
Title
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
61
0
0
08 Mar 2025
Smoothed Embeddings for Robust Language Models
Ryo Hase
Md. Rafi Ur Rashid
Ashley Lewis
Jing Liu
T. Koike-Akino
K. Parsons
Y. Wang
AAML
44
0
0
27 Jan 2025
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
36
25
0
16 Jul 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
79
155
0
02 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
95
225
0
15 Apr 2021
1