Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09154
Cited By
Attacking Large Language Models with Projected Gradient Descent
14 February 2024
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attacking Large Language Models with Projected Gradient Descent"
8 / 8 papers shown
Title
OET: Optimization-based prompt injection Evaluation Toolkit
Jinsheng Pan
Xiaogeng Liu
Chaowei Xiao
AAML
67
0
0
01 May 2025
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
Haoyu Wang
Zeyu Qin
Li Shen
Xueqian Wang
Minhao Cheng
Dacheng Tao
66
1
0
06 Feb 2025
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Hao Wang
Hao Li
Junda Zhu
Xinyuan Wang
C. Pan
Minlie Huang
Lei Sha
44
0
0
23 Dec 2024
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
57
5
0
02 Oct 2024
Discrete Randomized Smoothing Meets Quantum Computing
Md. Nazmus Sakib
Aman Saxena
Nicola Franco
Md Mashrur Arifin
Stephan Günnemann
AAML
14
1
0
01 Aug 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
79
155
0
02 Apr 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
31
36
0
14 Feb 2024
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
93
225
0
15 Apr 2021
1