ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.16914
  4. Cited By
DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM
  Jailbreakers

DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

25 February 2024
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
    AAML
ArXivPDFHTML

Papers citing "DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers"

7 / 7 papers shown
Title
Jailbreaking to Jailbreak
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
100
0
0
09 Feb 2025
Robust LLM safeguarding via refusal feature adversarial training
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
51
9
0
30 Sep 2024
Multi-Turn Context Jailbreak Attack on Large Language Models From First
  Principles
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Hui Li
AAML
19
11
0
08 Aug 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
66
6
0
08 Jun 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1