Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2312.02119
Cited By
v1
v2 (latest)
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Neural Information Processing Systems (NeurIPS), 2023
4 December 2023
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Tree of Attacks: Jailbreaking Black-Box LLMs Automatically"
50 / 164 papers shown
Title
Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion
Yu Cui
Yifei Liu
Hang Fu
Sicheng Pan
Haibin Zhang
Cong Zuo
Licheng Wang
137
0
0
24 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
158
0
0
23 Nov 2025
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao
Zhe Li
Yige Li
Jun Sun
AAML
88
0
0
20 Nov 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen
Xin Wang
Juncheng Li
Yixu Wang
Jie Li
Yan Teng
Yingchun Wang
Xingjun Ma
AAML
253
0
0
16 Nov 2025
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren
Shahar Katz
Lior Wolf
AAML
177
0
0
15 Nov 2025
KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs
Shuyuan Liu
Jiawei Chen
Xiao Yang
Hang Su
Z. Yin
AAML
145
0
0
09 Nov 2025
Jailbreaking in the Haystack
Rishi Rajesh Shah
Chen Henry Wu
Shashwat Saxena
Ziqian Zhong
Alexander Robey
Aditi Raghunathan
68
0
0
05 Nov 2025
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie
Zesen Liu
Mingyu Luo
Z. Zhang
Kaikai Zhang
Zongjie Li
Ping Chen
Shuai Wang
Dongdong She
84
1
0
27 Oct 2025
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
Hanyu Zhu
Lance Fiondella
Jiawei Yuan
K. Zeng
Long Jiao
SILM
AAML
KELM
225
0
0
24 Oct 2025
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang
Mingjie Li
Michael Backes
Yang Zhang
108
3
0
24 Oct 2025
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
Sidhant Narula
J. Asl
Mohammad Ghasemigol
Eduardo Blanco
Daniel Takabi
AAML
64
0
0
21 Oct 2025
Black-box Optimization of LLM Outputs by Asking for Directions
Jie Zhang
Meng Ding
Yang Liu
Jue Hong
F. Tramèr
AAML
115
0
0
19 Oct 2025
BreakFun: Jailbreaking LLMs via Schema Exploitation
Amirkia Rafiei Oskooei
Mehmet S. Aktas
76
0
0
19 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
213
0
0
17 Oct 2025
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
Deyue Zhang
Dongdong Yang
Junjie Mu
Quancheng Zou
Zonghao Ying
Wenzhuo Xu
Zhao Liu
Xuan Wang
X. Zhang
152
1
0
16 Oct 2025
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Journal of Network and Computer Applications (JNCA), 2025
Guan-Yan Yang
Tzu-Yu Cheng
Ya-Wen Teng
Farn Wanga
Kuo-Hui Yeh
84
1
0
11 Oct 2025
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal
Amir Hossein Kargaran
Jana Diesner
104
0
0
10 Oct 2025
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
Bibekananda Patra
Aditya Mahesh Kolte
Sandipan Bandyopadhyay
99
11
0
10 Oct 2025
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections
Milad Nasr
Nicholas Carlini
Chawin Sitawarin
Sander Schulhoff
Jamie Hayes
...
Ilia Shumailov
Abhradeep Thakurta
Kai Yuanqing Xiao
Seth Neel
F. Tramèr
AAML
ELM
147
11
0
10 Oct 2025
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Ragib Amin Nihal
Rui Wen
Kazuhiro Nakadai
Jun Sakuma
113
0
0
09 Oct 2025
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Raffaele Mura
Giorgio Piras
Kamilė Lukošiūtė
Maura Pintor
Amin Karbasi
Battista Biggio
AAML
118
0
0
07 Oct 2025
Imperceptible Jailbreaking against Large Language Models
Kuofeng Gao
Y. Li
Chao Du
X. Wang
Xingjun Ma
Shu-Tao Xia
Tianyu Pang
AAML
102
0
0
06 Oct 2025
Proactive defense against LLM Jailbreak
Weiliang Zhao
Jinjun Peng
Daniel Ben-Levi
Zhou Yu
Junfeng Yang
AAML
123
1
0
06 Oct 2025
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
Yanjie Li
Yiming Cao
Dong Wang
Bin Xiao
LLMAG
AAML
102
1
0
05 Oct 2025
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Buyun Liang
Liangzu Peng
Jinqi Luo
D. Thaker
Kwan Ho Ryan Chan
Rene Vidal
AAML
97
0
0
05 Oct 2025
Bypassing Prompt Guards in Production with Controlled-Release Prompting
Jaiden Fairoze
Sanjam Garg
Keewoo Lee
Mingyuan Wang
SILM
AAML
197
1
0
02 Oct 2025
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Ruohao Guo
Afshin Oroojlooy
Roshan Sridhar
Miguel Ballesteros
Alan Ritter
Dan Roth
AAML
130
0
0
02 Oct 2025
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Isha Gupta
Rylan Schaeffer
Joshua Kazdan
Katja Filippova
Sanmi Koyejo
OOD
AAML
244
1
0
01 Oct 2025
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Shojiro Yamabe
Jun Sakuma
AAML
108
0
0
01 Oct 2025
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Jingdi Lei
Varun Gumma
Rishabh Bhardwaj
Seok Min Lim
Chuan Li
Amir Zadeh
Soujanya Poria
LLMAG
ALM
ELM
186
0
0
30 Sep 2025
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
X. Chen
Jian Zhao
Yuchen Yuan
T. Zhang
Huilin Zhou
...
Ping Hu
Linghe Kong
Chi Zhang
Weiran Huang
Xuelong Li
255
3
0
28 Sep 2025
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Francesco Marchiori
Rohan Sinha
Christopher Agia
Alexander Robey
George Pappas
Mauro Conti
Marco Pavone
AAML
81
0
0
27 Sep 2025
GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
Javad Forough
Mohammad Maheri
Hamed Haddadi
AAML
80
0
0
27 Sep 2025
AI Kill Switch for malicious web-based LLM agent
Sechan Lee
Sangdon Park
LLMAG
AAML
52
0
0
26 Sep 2025
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
174
0
0
18 Sep 2025
NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Lichao Wu
Sasha Behrouzi
Mohamadreza Rostami
Maximilian Thang
S. Picek
A. Sadeghi
AAML
177
0
0
15 Sep 2025
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia
Igor Jochem Sanz
Gabriel Antonio Fontes Rebello
Rodrigo Duarte de Meneses
Briland Hitaj
Ulf Lindqvist
205
0
0
12 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang
Yongcan Yu
Jian Liang
Ran He
HILM
LRM
185
4
0
04 Sep 2025
CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention
Xiaomeng Hu
Fei Huang
Chenhan Yuan
Junyang Lin
Tsung-Yi Ho
116
1
0
01 Sep 2025
Unraveling LLM Jailbreaks Through Safety Knowledge Neurons
Chongwen Zhao
Kaizhu Huang
AAML
KELM
100
1
0
01 Sep 2025
Safety Alignment Should Be Made More Than Just A Few Attention Heads
Chao Huang
Zefeng Zhang
Juewei Yue
Quangang Li
Chuang Zhang
Tingwen Liu
AAML
93
0
0
27 Aug 2025
Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks
Sheng Liu
Qiang Sheng
Danding Wang
Yang Li
Guang Yang
Juan Cao
215
2
0
27 Aug 2025
Evaluating Language Model Reasoning about Confidential Information
Dylan Sam
Alexander Robey
Andy Zou
Matt Fredrikson
J. Zico Kolter
ELM
LRM
104
0
0
27 Aug 2025
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
Melissa Kazemi Rad
Alberto Purpura
Himanshu Kumar
Emily Chen
Mohammad Sorower
75
0
0
23 Aug 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li
Xiaodong Wu
Qi Li
Jianbing Ni
Rongxing Lu
AAML
MU
KELM
84
0
0
21 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
124
1
0
20 Aug 2025
Building and Measuring Trust between Large Language Models
Maarten Buyl
Yousra Fettach
Guillaume Bied
Tijl De Bie
LLMAG
HILM
140
0
0
20 Aug 2025
CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection
Jiaming Hu
Haoyu Wang
Debarghya Mukherjee
Ioannis Ch. Paschalidis
AAML
60
0
0
19 Aug 2025
FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning Balance
Jianhao Chen
Mayi Xu
Xiaohu Li
Yongqi Li
Xiangyu Zhang
Jianjie Huang
T. Qian
LRM
125
0
0
18 Aug 2025
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie
Xurui Song
Jun Luo
92
5
0
17 Aug 2025
1
2
3
4
Next