Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2412.10321
Cited By
v1
v2 (latest)
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
13 December 2024
Sicheng Zhu
Brandon Amos
Yuandong Tian
Chuan Guo
Ivan Evtimov
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Github (35★)
Papers citing
"AdvPrefix: An Objective for Nuanced LLM Jailbreaks"
8 / 8 papers shown
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer
Jonas Dornbusch
Jakob Steimle
Moritz Ladenburger
Leo Schwinn
Stephan Günnemann
AAML
277
2
0
06 Nov 2025
MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
Weisen Jiang
Sinno Jialin Pan
AAML
200
2
0
09 Oct 2025
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie
Xurui Song
Jun Luo
257
6
0
17 Aug 2025
Jailbreak Distillation: Renewable Safety Benchmarking
Jingyu Zhang
Ahmed Elgohary
Xiawei Wang
A S M Iftekhar
Ahmed Magooda
Benjamin Van Durme
Daniel Khashabi
Kyle Jackson
AAML
ALM
287
0
0
28 May 2025
Security Concerns for Large Language Models: A Survey
Miles Q. Li
Benjamin C. M. Fung
PILM
ELM
927
32
0
24 May 2025
Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses
Xiaoxue Yang
Bozhidar Stevanoski
Matthieu Meeus
Yves-Alexandre de Montjoye
AAML
354
1
0
21 May 2025
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
1.0K
14
0
04 Mar 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
307
12
0
24 Feb 2025
1
Page 1 of 1