ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.02119
  4. Cited By
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
v1v2 (latest)

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

Neural Information Processing Systems (NeurIPS), 2023
4 December 2023
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
ArXiv (abs)PDFHTML

Papers citing "Tree of Attacks: Jailbreaking Black-Box LLMs Automatically"

50 / 167 papers shown
Title
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Tengyun Ma
Jiaqi Yao
Daojing He
Shihao Peng
Yu Li
Shaohui Liu
Zhuotao Tian
84
0
0
03 Dec 2025
A Safety and Security Framework for Real-World Agentic Systems
A Safety and Security Framework for Real-World Agentic Systems
Shaona Ghosh
Barnaby Simkin
Kyriacos Shiarlis
Soumili Nandi
Dan Zhao
...
Nikki Pope
Roopa Prabhu
Daniel Rohrer
Michael Demoret
Bartley Richardson
32
0
0
27 Nov 2025
Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion
Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion
Yu Cui
Yifei Liu
Hang Fu
Sicheng Pan
Haibin Zhang
Cong Zuo
Licheng Wang
169
0
0
24 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
262
0
0
23 Nov 2025
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao
Zhe Li
Yige Li
Jun Sun
AAML
96
0
0
20 Nov 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen
Xin Wang
Juncheng Li
Yixu Wang
Jie Li
Yan Teng
Yingchun Wang
Xingjun Ma
AAML
273
0
0
16 Nov 2025
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren
Shahar Katz
Lior Wolf
AAML
197
1
0
15 Nov 2025
KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs
KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs
Shuyuan Liu
Jiawei Chen
Xiao Yang
Hang Su
Z. Yin
AAML
173
0
0
09 Nov 2025
Jailbreaking in the Haystack
Jailbreaking in the Haystack
Rishi Rajesh Shah
Chen Henry Wu
Shashwat Saxena
Ziqian Zhong
Alexander Robey
Aditi Raghunathan
96
0
0
05 Nov 2025
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie
Zesen Liu
Mingyu Luo
Z. Zhang
Kaikai Zhang
Zongjie Li
Ping Chen
Shuai Wang
Dongdong She
100
1
0
27 Oct 2025
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
Hanyu Zhu
Lance Fiondella
Jiawei Yuan
K. Zeng
Long Jiao
SILMAAMLKELM
269
0
0
24 Oct 2025
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang
Mingjie Li
Michael Backes
Yang Zhang
160
3
0
24 Oct 2025
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
Sidhant Narula
J. Asl
Mohammad Ghasemigol
Eduardo Blanco
Daniel Takabi
AAML
92
0
0
21 Oct 2025
Black-box Optimization of LLM Outputs by Asking for Directions
Black-box Optimization of LLM Outputs by Asking for Directions
Jie Zhang
Meng Ding
Yang Liu
Jue Hong
F. Tramèr
AAML
143
0
0
19 Oct 2025
BreakFun: Jailbreaking LLMs via Schema Exploitation
BreakFun: Jailbreaking LLMs via Schema Exploitation
Amirkia Rafiei Oskooei
Mehmet S. Aktas
104
0
0
19 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
225
0
0
17 Oct 2025
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
Deyue Zhang
Dongdong Yang
Junjie Mu
Quancheng Zou
Zonghao Ying
Wenzhuo Xu
Zhao Liu
Xuan Wang
X. Zhang
152
1
0
16 Oct 2025
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-testJournal of Network and Computer Applications (JNCA), 2025
Guan-Yan Yang
Tzu-Yu Cheng
Ya-Wen Teng
Farn Wanga
Kuo-Hui Yeh
96
2
0
11 Oct 2025
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections
Milad Nasr
Nicholas Carlini
Chawin Sitawarin
Sander Schulhoff
Jamie Hayes
...
Ilia Shumailov
Abhradeep Thakurta
Kai Yuanqing Xiao
Seth Neel
F. Tramèr
AAMLELM
179
12
0
10 Oct 2025
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal
Amir Hossein Kargaran
Jana Diesner
136
0
0
10 Oct 2025
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
Bibekananda Patra
Aditya Mahesh Kolte
Sandipan Bandyopadhyay
107
11
0
10 Oct 2025
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Ragib Amin Nihal
Rui Wen
Kazuhiro Nakadai
Jun Sakuma
121
0
0
09 Oct 2025
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Raffaele Mura
Giorgio Piras
Kamilė Lukošiūtė
Maura Pintor
Amin Karbasi
Battista Biggio
AAML
150
0
0
07 Oct 2025
Imperceptible Jailbreaking against Large Language Models
Imperceptible Jailbreaking against Large Language Models
Kuofeng Gao
Y. Li
Chao Du
X. Wang
Xingjun Ma
Shu-Tao Xia
Tianyu Pang
AAML
130
0
0
06 Oct 2025
Proactive defense against LLM Jailbreak
Proactive defense against LLM Jailbreak
Weiliang Zhao
Jinjun Peng
Daniel Ben-Levi
Zhou Yu
Junfeng Yang
AAML
147
1
0
06 Oct 2025
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
Yanjie Li
Yiming Cao
Dong Wang
Bin Xiao
LLMAGAAML
124
1
0
05 Oct 2025
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Buyun Liang
Liangzu Peng
Jinqi Luo
D. Thaker
Kwan Ho Ryan Chan
Rene Vidal
AAML
117
0
0
05 Oct 2025
Bypassing Prompt Guards in Production with Controlled-Release Prompting
Bypassing Prompt Guards in Production with Controlled-Release Prompting
Jaiden Fairoze
Sanjam Garg
Keewoo Lee
Mingyuan Wang
SILMAAML
245
1
0
02 Oct 2025
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Ruohao Guo
Afshin Oroojlooy
Roshan Sridhar
Miguel Ballesteros
Alan Ritter
Dan Roth
AAML
150
0
0
02 Oct 2025
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Isha Gupta
Rylan Schaeffer
Joshua Kazdan
Katja Filippova
Sanmi Koyejo
OODAAML
262
1
0
01 Oct 2025
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Shojiro Yamabe
Jun Sakuma
AAML
120
0
0
01 Oct 2025
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Jingdi Lei
Varun Gumma
Rishabh Bhardwaj
Seok Min Lim
Chuan Li
Amir Zadeh
Soujanya Poria
LLMAGALMELM
219
0
0
30 Sep 2025
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
X. Chen
Jian Zhao
Yuchen Yuan
T. Zhang
Huilin Zhou
...
Ping Hu
Linghe Kong
Chi Zhang
Weiran Huang
Xuelong Li
311
3
0
28 Sep 2025
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Francesco Marchiori
Rohan Sinha
Christopher Agia
Alexander Robey
George Pappas
Mauro Conti
Marco Pavone
AAML
101
0
0
27 Sep 2025
GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
Javad Forough
Mohammad Maheri
Hamed Haddadi
AAML
104
1
0
27 Sep 2025
AI Kill Switch for malicious web-based LLM agent
AI Kill Switch for malicious web-based LLM agent
Sechan Lee
Sangdon Park
LLMAGAAML
84
0
0
26 Sep 2025
Semantic Representation Attack against Aligned Large Language Models
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
234
0
0
18 Sep 2025
NeuroStrike: Neuron-Level Attacks on Aligned LLMs
NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Lichao Wu
Sasha Behrouzi
Mohamadreza Rostami
Maximilian Thang
S. Picek
A. Sadeghi
AAML
229
1
0
15 Sep 2025
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia
Igor Jochem Sanz
Gabriel Antonio Fontes Rebello
Rodrigo Duarte de Meneses
Briland Hitaj
Ulf Lindqvist
233
0
0
12 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang
Yongcan Yu
Jian Liang
Ran He
HILMLRM
197
5
0
04 Sep 2025
Unraveling LLM Jailbreaks Through Safety Knowledge Neurons
Unraveling LLM Jailbreaks Through Safety Knowledge Neurons
Chongwen Zhao
Kaizhu Huang
AAMLKELM
120
1
0
01 Sep 2025
CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention
CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention
Xiaomeng Hu
Fei Huang
Chenhan Yuan
Junyang Lin
Tsung-Yi Ho
152
2
0
01 Sep 2025
Safety Alignment Should Be Made More Than Just A Few Attention Heads
Safety Alignment Should Be Made More Than Just A Few Attention Heads
Chao Huang
Zefeng Zhang
Juewei Yue
Quangang Li
Chuang Zhang
Tingwen Liu
AAML
113
0
0
27 Aug 2025
Evaluating Language Model Reasoning about Confidential Information
Evaluating Language Model Reasoning about Confidential Information
Dylan Sam
Alexander Robey
Andy Zou
Matt Fredrikson
J. Zico Kolter
ELMLRM
120
0
0
27 Aug 2025
Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks
Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks
Sheng Liu
Qiang Sheng
Danding Wang
Yang Li
Guang Yang
Juan Cao
235
2
0
27 Aug 2025
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
Melissa Kazemi Rad
Alberto Purpura
Himanshu Kumar
Emily Chen
Mohammad Sorower
111
0
0
23 Aug 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li
Xiaodong Wu
Qi Li
Jianbing Ni
Rongxing Lu
AAMLMUKELM
96
0
0
21 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
144
2
0
20 Aug 2025
Building and Measuring Trust between Large Language Models
Building and Measuring Trust between Large Language Models
Maarten Buyl
Yousra Fettach
Guillaume Bied
Tijl De Bie
LLMAGHILM
144
0
0
20 Aug 2025
CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection
CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection
Jiaming Hu
Haoyu Wang
Debarghya Mukherjee
Ioannis Ch. Paschalidis
AAML
88
0
0
19 Aug 2025
1234
Next