ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.01833
  4. Cited By
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

2 April 2024
M. Russinovich
Ahmed Salem
Ronen Eldan
ArXivPDFHTML

Papers citing "Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack"

50 / 65 papers shown
Title
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
25
0
0
08 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
49
0
0
02 May 2025
Safety in Large Reasoning Models: A Survey
Safety in Large Reasoning Models: A Survey
Cheng Wang
Y. Liu
B. Li
Duzhen Zhang
Z. Li
Junfeng Fang
LRM
67
1
0
24 Apr 2025
The Structural Safety Generalization Problem
The Structural Safety Generalization Problem
Julius Broomfield
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Tia Nasir
Jason Zhang
Reihaneh Iranmanesh
Sara Pieri
Reihaneh Rabbany
Kellin Pelrine
AAML
23
0
0
13 Apr 2025
Bypassing Safety Guardrails in LLMs Using Humor
Bypassing Safety Guardrails in LLMs Using Humor
Pedro Cisneros-Velarde
29
0
0
09 Apr 2025
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Yu Inatsu
36
0
0
04 Apr 2025
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning
S. Chen
Xiao Yu
Ninareh Mehrabi
Rahul Gupta
Zhou Yu
Ruoxi Jia
AAML
LLMAG
45
0
0
02 Apr 2025
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
Johan Wahréus
Ahmed Mohamed Hussain
P. Papadimitratos
46
0
0
27 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Y. Wang
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
54
0
0
24 Mar 2025
Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Andy Zhou
MU
67
0
0
13 Mar 2025
Safety Guardrails for LLM-Enabled Robots
Zachary Ravichandran
Alexander Robey
Vijay R. Kumar
George Pappas
Hamed Hassani
56
0
0
10 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
63
0
0
08 Mar 2025
Jailbreaking is (Mostly) Simpler Than You Think
M. Russinovich
Ahmed Salem
AAML
61
0
0
07 Mar 2025
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lù
Alejandra Zambrano
Arkil Patel
Esin Durmus
Spandana Gella
Karolina Stañczak
Siva Reddy
LLMAG
ELM
85
2
0
06 Mar 2025
Improving LLM Safety Alignment with Dual-Objective Optimization
Xuandong Zhao
Will Cai
Tianneng Shi
David Huang
Licong Lin
Song Mei
Dawn Song
AAML
MU
61
1
0
05 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
82
0
0
03 Mar 2025
Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
Nimet Beyza Bozdag
Shuhaib Mehri
Gökhan Tür
Dilek Hakkani-Tür
59
0
0
03 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAML
LLMSV
44
1
0
28 Feb 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
X. Zhang
AAML
52
0
0
27 Feb 2025
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Shanshan Han
Salman Avestimehr
Chaoyang He
71
0
0
12 Feb 2025
Jailbreaking to Jailbreak
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
109
0
0
09 Feb 2025
Confidence Elicitation: A New Attack Vector for Large Language Models
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento
Chuan-Sheng Foo
See-Kiong Ng
AAML
94
0
0
07 Feb 2025
Lessons From Red Teaming 100 Generative AI Products
Lessons From Red Teaming 100 Generative AI Products
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
...
Pete Bryan
Ram Shankar Siva Kumar
Yonatan Zunger
Chang Kawaguchi
Mark Russinovich
AAML
VLM
37
4
0
13 Jan 2025
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
Fengxiang Wang
Ranjie Duan
Peng Xiao
Xiaojun Jia
Shiji Zhao
...
Hang Su
Jialing Tao
Hui Xue
J. Zhu
Hui Xue
LLMAG
48
6
0
08 Jan 2025
Targeting the Core: A Simple and Effective Method to Attack RAG-based
  Agents via Direct LLM Manipulation
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation
Xuying Li
Zhuo Li
Yuji Kosuga
Yasuhiro Yoshida
Victor Bian
AAML
86
2
0
05 Dec 2024
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented
  Generation Applications with Agent-based Attacks
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks
Changyue Jiang
Xudong Pan
Geng Hong
Chenfu Bao
Min Yang
SILM
72
7
0
21 Nov 2024
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
54
10
0
18 Nov 2024
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Adam Fourney
Gagan Bansal
Hussein Mozannar
Cheng Tan
Eduardo Salinas
...
Victor C. Dibia
Ahmed Hassan Awadallah
Ece Kamar
Rafah Hosn
Saleema Amershi
AI4CE
LRM
LLMAG
38
34
0
07 Nov 2024
Plentiful Jailbreaks with String Compositions
Plentiful Jailbreaks with String Compositions
Brian R. Y. Huang
AAML
41
2
0
01 Nov 2024
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
Itay Nakash
George Kour
Guy Uziel
Ateret Anaby-Tavor
AAML
LLMAG
27
4
0
22 Oct 2024
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language
  Models
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
AAML
34
1
0
15 Oct 2024
Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Fast Convergence of ΦΦΦ-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Siddharth Mitra
Andre Wibisono
44
23
0
14 Oct 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent
  Enhanced Explanation Evaluation Framework
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
22
7
0
11 Oct 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Priyanshu Kumar
Elaine Lau
Saranya Vijayakumar
Tu Trinh
Scale Red Team
...
Sean Hendryx
Shuyan Zhou
Matt Fredrikson
Summer Yue
Zifan Wang
LLMAG
34
17
0
11 Oct 2024
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven
  Argumentation
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation
Tomas Bueno Momcilovic
Beat Buesser
Giulio Zizzo
Mark Purcell
Tomas Bueno Momcilovic
AAML
22
2
0
10 Oct 2024
Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs
Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs
Tomas Bueno Momcilovic
Beat Buesser
Giulio Zizzo
Mark Purcell
Dian Balta
AAML
25
2
0
04 Oct 2024
Developing Assurance Cases for Adversarial Robustness and Regulatory
  Compliance in LLMs
Developing Assurance Cases for Adversarial Robustness and Regulatory Compliance in LLMs
Tomas Bueno Momcilovic
Dian Balta
Beat Buesser
Giulio Zizzo
Mark Purcell
AAML
16
0
0
04 Oct 2024
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova
Erik Brinkman
Krithika Iyer
Vítor Albiero
Joanna Bitton
Hailey Nguyen
J. Li
Cristian Canton Ferrer
Ivan Evtimov
Aaron Grattafiori
ALM
26
8
0
02 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
29
12
0
02 Oct 2024
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
73
5
0
02 Oct 2024
PyRIT: A Framework for Security Risk Identification and Red Teaming in
  Generative AI System
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System
Gary D. Lopez Munoz
Amanda Minnich
Roman Lutz
Richard Lundeen
Raja Sekhar Rao Dheekonda
...
Tori Westerhoff
Chang Kawaguchi
Christian Seifert
Ram Shankar Siva Kumar
Yonatan Zunger
SILM
16
8
0
01 Oct 2024
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
Xuefeng Du
Reshmi Ghosh
Robert Sim
Ahmed Salem
Vitor Carvalho
Emily Lawton
Yixuan Li
Jack W. Stokes
VLM
AAML
35
5
0
01 Oct 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard
  for Prompt Attacks
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
Giandomenico Cornacchia
Giulio Zizzo
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Mark Purcell
16
1
0
26 Sep 2024
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
Alan Aqrawi
Arian Abbasi
AAML
23
0
0
04 Sep 2024
Conversational Complexity for Assessing Risk in Large Language Models
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández Orallo
40
0
0
02 Sep 2024
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak
  Attacks
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Jason Zhang
Julius Broomfield
Sara Pieri
Reihaneh Iranmanesh
Reihaneh Rabbany
Kellin Pelrine
AAML
28
6
0
29 Aug 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
44
38
0
27 Aug 2024
Multi-Turn Context Jailbreak Attack on Large Language Models From First
  Principles
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Hui Li
AAML
19
11
0
08 Aug 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
42
27
0
16 Jul 2024
Securing Multi-turn Conversational Language Models Against Distributed
  Backdoor Triggers
Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers
Terry Tong
Jiashu Xu
Qin Liu
Muhao Chen
AAML
SILM
30
1
0
04 Jul 2024
12
Next