Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16914
Cited By
DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
25 February 2024
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers"
7 / 7 papers shown
Title
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
100
0
0
09 Feb 2025
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
51
9
0
30 Sep 2024
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Hui Li
AAML
19
11
0
08 Aug 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
66
6
0
08 Jun 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1