ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.03825
  4. Cited By
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak
  Prompts on Large Language Models

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

7 August 2023
Xinyue Shen
Z. Chen
Michael Backes
Yun Shen
Yang Zhang
    SILM
ArXivPDFHTML

Papers citing ""Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models"

50 / 178 papers shown
Title
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
16
0
0
13 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAML
SILM
46
0
0
07 May 2025
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
X. Jia
Yingfei Sun
Qianqian Xu
Q. Huang
AAML
68
0
0
03 May 2025
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
Marco Arazzi
Vignesh Kumar Kembu
Antonino Nocera
V. P.
78
0
0
30 Apr 2025
ACE: A Security Architecture for LLM-Integrated App Systems
ACE: A Security Architecture for LLM-Integrated App Systems
Evan Li
Tushin Mallick
Evan Rose
William K. Robertson
Alina Oprea
Cristina Nita-Rotaru
52
0
0
29 Apr 2025
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
63
0
0
28 Apr 2025
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Yakai Li
Jiekang Hu
Weiduan Sang
Luping Ma
Jing Xie
Weijuan Zhang
Aimin Yu
Shijie Zhao
Qingjia Huang
Qihang Zhou
AAML
52
0
0
28 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
Xinzhe Huang
Kedong Xiu
T. Zheng
Churui Zeng
Wangze Ni
Zhan Qiin
K. Ren
C. L. P. Chen
AAML
23
0
0
21 Apr 2025
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
Léo Boisvert
Mihir Bansal
Chandra Kiran Reddy Evuru
Gabriel Huang
Abhay Puri
...
Quentin Cappart
Jason Stanley
Alexandre Lacoste
Alexandre Drouin
Krishnamurthy Dvijotham
30
0
0
18 Apr 2025
Activated LoRA: Fine-tuned LLMs for Intrinsics
Activated LoRA: Fine-tuned LLMs for Intrinsics
Kristjan Greenewald
Luis A. Lastras
Thomas Parnell
Vraj Shah
Lucian Popa
Giulio Zizzo
Chulaka Gunasekara
Ambrish Rawat
David D. Cox
22
0
0
16 Apr 2025
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
Charlotte Siska
Anush Sankaran
AAML
43
0
0
10 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
19
0
0
07 Apr 2025
Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering
Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering
Hamed Jelodar
Samita Bai
Parisa Hamedi
Hesamodin Mohammadian
R. Razavi-Far
Ali Ghorbani
34
0
0
07 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
24
0
0
07 Apr 2025
Rethinking Reflection in Pre-Training
Rethinking Reflection in Pre-Training
Essential AI
Darsh J Shah
Peter Rushton
Somanshu Singla
Mohit Parmar
...
Philip Monk
Platon Mazarakis
Ritvik Kapila
Saurabh Srivastava
Tim Romanski
ReLM
LRM
43
3
0
05 Apr 2025
PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization
PiCo: Jailbreaking Multimodal Large Language Models via Pi\textbf{Pi}Pictorial Co\textbf{Co}Code Contextualization
Aofan Liu
Lulu Tang
Ting Pan
Yuguo Yin
Bin Wang
Ao Yang
MLLM
AAML
40
0
0
02 Apr 2025
Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models
Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models
Shih-Wen Ke
Guan-Yu Lai
Guo-Lin Fang
Hsi-Yuan Kao
SILM
84
0
0
26 Mar 2025
TeleLoRA: Teleporting Model-Specific Alignment Across LLMs
TeleLoRA: Teleporting Model-Specific Alignment Across LLMs
Xiao Lin
Manoj Acharya
Anirban Roy
Susmit Jha
MoMe
70
0
0
26 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Y. Wang
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
54
0
0
24 Mar 2025
Safe Vision-Language Models via Unsafe Weights Manipulation
Safe Vision-Language Models via Unsafe Weights Manipulation
Moreno DÍncà
E. Peruzzo
Xingqian Xu
Humphrey Shi
N. Sebe
Massimiliano Mancini
MU
55
0
0
14 Mar 2025
Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States
Xin Wei Chia
Jonathan Pan
AAML
39
0
0
12 Mar 2025
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
Wenlong Meng
Fan Zhang
Wendao Yao
Zhenyuan Guo
Y. Li
Chengkun Wei
Wenzhi Chen
AAML
38
1
0
11 Mar 2025
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
Lorenz Wolf
Sangwoong Yoon
Ilija Bogunovic
45
0
0
07 Mar 2025
ToxicSQL: Migrating SQL Injection Threats into Text-to-SQL Models via Backdoor Attack
ToxicSQL: Migrating SQL Injection Threats into Text-to-SQL Models via Backdoor Attack
Meiyu Lin
Haichuan Zhang
Jiale Lao
Renyuan Li
Yuanchun Zhou
Carl Yang
Yang Cao
Mingjie Tang
SILM
64
0
0
07 Mar 2025
Improving LLM Safety Alignment with Dual-Objective Optimization
Xuandong Zhao
Will Cai
Tianneng Shi
David Huang
Licong Lin
Song Mei
Dawn Song
AAML
MU
59
1
0
05 Mar 2025
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
100
0
0
04 Mar 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Franck Cappello
Sandeep Madireddy
Robert Underwood
N. Getty
Nicholas Chia
...
M. Rafique
Eliu A. Huerta
B. Li
Ian Foster
Rick L. Stevens
69
1
0
27 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
81
2
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
67
0
0
24 Feb 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
71
7
0
12 Feb 2025
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
Haoyu Wang
Zeyu Qin
Li Shen
Xueqian Wang
Minhao Cheng
Dacheng Tao
86
1
0
06 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
38
0
0
05 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
83
3
0
04 Feb 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models
Peering Behind the Shield: Guardrail Identification in Large Language Models
Ziqing Yang
Yixin Wu
Rui Wen
Michael Backes
Yang Zhang
55
1
0
03 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
60
1
0
02 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
107
9
0
28 Jan 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi
Yueqi Xie
Bin Zhu
Emre Kiciman
Guangzhong Sun
Xing Xie
Fangzhao Wu
AAML
51
64
0
28 Jan 2025
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer
Micah Goldblum
Teresa Datta
Sanjana Nambiar
Raz Besaleli
Samuel Dooley
Max Cembalest
John P. Dickerson
ALM
35
6
0
28 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Junfeng Fang
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
56
0
0
03 Jan 2025
Robustness of Large Language Models Against Adversarial Attacks
Robustness of Large Language Models Against Adversarial Attacks
Yiyi Tao
Yixian Shen
Hang Zhang
Yanxin Shen
Lun Wang
Chuanqi Shi
Shaoshuai Du
AAML
68
6
0
22 Dec 2024
OpenAI o1 System Card
OpenAI o1 System Card
OpenAI OpenAI
:
Aaron Jaech
Adam Tauman Kalai
Adam Lerer
...
Yuchen He
Yuchen Zhang
Yunyun Wang
Zheng Shao
Zhuohan Li
ELM
LRM
AI4CE
77
1
0
21 Dec 2024
Position: A taxonomy for reporting and describing AI security incidents
Position: A taxonomy for reporting and describing AI security incidents
L. Bieringer
Kevin Paeth
Andreas Wespi
Kathrin Grosse
Alexandre Alahi
Kathrin Grosse
78
0
0
19 Dec 2024
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented
  Generation Applications with Agent-based Attacks
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks
Changyue Jiang
Xudong Pan
Geng Hong
Chenfu Bao
Min Yang
SILM
69
7
0
21 Nov 2024
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
Gabriel Chua
Shing Yee Chan
Shaun Khoo
75
1
0
20 Nov 2024
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach
Ruoxi Sun
Jiamin Chang
Hammond Pearce
Chaowei Xiao
B. Li
Qi Wu
Surya Nepal
Minhui Xue
30
0
0
17 Nov 2024
Transferable Ensemble Black-box Jailbreak Attacks on Large Language
  Models
Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models
Yiqi Yang
Hongye Fu
AAML
14
0
0
31 Oct 2024
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A
  Comparative Analysis
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
26
1
0
21 Oct 2024
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical
  Synthesis
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
Aidan Wong
He Cao
Zijing Liu
Yu Li
33
2
0
21 Oct 2024
When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep
  Secret or Forget Knowledge?
When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?
Shang Wang
Tianqing Zhu
Dayong Ye
Wanlei Zhou
MU
38
2
0
20 Oct 2024
1234
Next