ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.02119
  4. Cited By
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
v1v2 (latest)

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

Neural Information Processing Systems (NeurIPS), 2023
4 December 2023
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
ArXiv (abs)PDFHTML

Papers citing "Tree of Attacks: Jailbreaking Black-Box LLMs Automatically"

50 / 167 papers shown
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Kristina Nikolić
Luze Sun
Jie Zhang
F. Tramèr
218
13
0
14 Apr 2025
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-JudgeMachine-mediated learning (ML), 2025
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAMLELM
382
18
0
10 Apr 2025
NLP Security and Ethics, in the Wild
NLP Security and Ethics, in the WildTransactions of the Association for Computational Linguistics (TACL), 2025
Heather Lent
Erick Galinkin
Yiyi Chen
Jens Myrup Pedersen
Leon Derczynski
Johannes Bjerva
SILM
413
1
0
09 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Xitao Li
Jian Shu
Jiang Wu
Ting Liu
AAML
231
2
0
08 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
676
1
0
07 Apr 2025
Multi-Agent Systems Execute Arbitrary Malicious Code
Multi-Agent Systems Execute Arbitrary Malicious Code
Harold Triedman
Rishi Jha
Vitaly Shmatikov
LLMAGAAML
392
20
0
15 Mar 2025
Safe Vision-Language Models via Unsafe Weights Manipulation
Safe Vision-Language Models via Unsafe Weights Manipulation
Moreno DÍncà
E. Peruzzo
Xingqian Xu
Humphrey Shi
Andrii Zadaianchuk
Goran Frehse
MU
424
1
0
14 Mar 2025
MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming
MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming
Stefan Schoepf
Muhammad Zaid Hameed
Ambrish Rawat
Kieran Fraser
Giulio Zizzo
Giandomenico Cornacchia
Mark Purcell
196
0
0
08 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
444
3
0
08 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAMLLLMSV
357
8
0
28 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMsInternational Conference on Learning Representations (ICLR), 2025
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Juil Sock
Adel Bibi
348
4
0
26 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
277
8
0
24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
364
10
0
24 Feb 2025
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Seanie Lee
Dong Bok Lee
Dominik Wagner
Minki Kang
Haebin Seong
Tobias Bocklet
Juho Lee
Sung Ju Hwang
539
3
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
310
2
0
18 Feb 2025
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
Shenyi Zhang
Yuchen Zhai
Keyan Guo
Hongxin Hu
Shengnan Guo
Zheng Fang
Lingchen Zhao
Chao Shen
Cong Wang
Qian Wang
AAML
312
24
0
11 Feb 2025
Confidence Elicitation: A New Attack Vector for Large Language Models
Confidence Elicitation: A New Attack Vector for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Brian Formento
Chuan-Sheng Foo
See-Kiong Ng
AAML
587
2
0
07 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
277
2
0
05 Feb 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models
Peering Behind the Shield: Guardrail Identification in Large Language Models
Ziqing Yang
Yixin Wu
Rui Wen
Michael Backes
Yang Zhang
253
3
0
03 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchNeural Information Processing Systems (NeurIPS), 2024
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
426
40
0
28 Jan 2025
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Melissa Kazemi Rad
Huy Nghiem
Andy Luo
Sahil Wadhwa
Mohammad Sorower
Stephen Rawls
AAML
304
18
0
22 Jan 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt TemplatesNeural Information Processing Systems (NeurIPS), 2024
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
423
86
0
20 Jan 2025
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
David Williams-King
Linh Le
Adam Oberman
Yoshua Bengio
AAML
255
0
0
19 Jan 2025
Lessons From Red Teaming 100 Generative AI Products
Lessons From Red Teaming 100 Generative AI Products
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
...
Pete Bryan
Ram Shankar Siva Kumar
Yonatan Zunger
Chang Kawaguchi
Mark Russinovich
AAMLVLM
305
19
0
13 Jan 2025
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
Fengxiang Wang
Ranjie Duan
Peng Xiao
Yang Liu
Shiji Zhao
...
Hang Su
Jialing Tao
Hui Xue
Jun Zhu
Hui Xue
LLMAG
285
25
0
08 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Cunchun Li
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
392
8
0
03 Jan 2025
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
Xiyang Hu
AAML
334
3
0
01 Jan 2025
The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
Xikang Yang
Xuehai Tang
Jizhong Han
Songlin Hu
270
5
0
18 Nov 2024
Diversity Helps Jailbreak Large Language Models
Diversity Helps Jailbreak Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
1.1K
4
0
06 Nov 2024
Transferable & Stealthy Ensemble Attacks: A Black-Box Jailbreaking Framework for Large Language Models
Transferable & Stealthy Ensemble Attacks: A Black-Box Jailbreaking Framework for Large Language Models
Yiqi Yang
Hongye Fu
AAML
150
0
0
31 Oct 2024
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Rui Pu
Chaozhuo Li
Rui Ha
Zejian Chen
Litian Zhang
Ziqiang Liu
Lirong Qiu
Xi Zhang
AAML
239
5
0
18 Oct 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
354
3
0
15 Oct 2024
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak AttacksInternational Conference on Learning Representations (ICLR), 2024
Zi Wang
Divyam Anshumaan
Ashish Hooda
Yudong Chen
Somesh Jha
AAML
263
4
0
05 Oct 2024
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMsInternational Conference on Learning Representations (ICLR), 2024
Xiaogeng Liu
Peiran Li
Edward Suh
Yevgeniy Vorobeychik
Zhuoqing Mao
Somesh Jha
Patrick McDaniel
Huan Sun
Bo Li
Chaowei Xiao
495
98
0
03 Oct 2024
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection LearningInternational Conference on Learning Representations (ICLR), 2024
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
379
14
0
02 Oct 2024
Multimodal Pragmatic Jailbreak on Text-to-image Models
Multimodal Pragmatic Jailbreak on Text-to-image ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Tong Liu
Zhixin Lai
Jiawen Wang
Gengyuan Zhang
Shuo Chen
Juil Sock
Vera Demberg
Volker Tresp
Jindong Gu
313
10
0
27 Sep 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILMAAML
460
19
0
23 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILMAAML
346
8
0
05 Sep 2024
LLMmap: Fingerprinting For Large Language Models
LLMmap: Fingerprinting For Large Language Models
Dario Pasquini
Evgenios M. Kornaropoulos
G. Ateniese
509
24
0
22 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
436
41
0
20 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
365
14
0
11 Jul 2024
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Zisu Huang
Xiaohua Wang
Feiran Zhang
Zhibo Xu
Cenyuan Zhang
Qi Qian
Xiaoqing Zheng
Qi Zhang
AAMLLRM
509
6
0
01 Jul 2024
Poisoned LangChain: Jailbreak LLMs by LangChain
Poisoned LangChain: Jailbreak LLMs by LangChain
Ziqiu Wang
Jun Liu
Shengkai Zhang
Yang Yang
179
13
0
26 Jun 2024
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
208
19
0
17 Jun 2024
Threat Modelling and Risk Analysis for Large Language Model
  (LLM)-Powered Applications
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications
Stephen Burabari Tete
259
13
0
16 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
482
23
0
13 Jun 2024
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Avital Shafran
R. Schuster
Vitaly Shmatikov
721
63
0
09 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
480
34
0
08 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models
  and Their Defenses
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
AAML
315
70
0
03 Jun 2024
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
Chen Xiong
Xiangyu Qi
Pin-Yu Chen
Tsung-Yi Ho
AAML
365
33
0
30 May 2024
Previous
1234
Next