ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.05197
  4. Cited By
Multi-step Jailbreaking Privacy Attacks on ChatGPT

Multi-step Jailbreaking Privacy Attacks on ChatGPT

11 April 2023
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
    SILM
ArXivPDFHTML

Papers citing "Multi-step Jailbreaking Privacy Attacks on ChatGPT"

50 / 235 papers shown
Title
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
X. Jia
Yingfei Sun
Qianqian Xu
Q. Huang
AAML
60
0
0
03 May 2025
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Yakai Li
Jiekang Hu
Weiduan Sang
Luping Ma
Jing Xie
Weijuan Zhang
Aimin Yu
Shijie Zhao
Qingjia Huang
Qihang Zhou
AAML
45
0
0
28 Apr 2025
DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
Tamim Al Mahmud
N. Jebreel
Josep Domingo-Ferrer
David Sánchez
MU
25
0
0
18 Apr 2025
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
CheolWon Na
YunSeok Choi
Jee-Hyong Lee
AAML
29
0
0
18 Apr 2025
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
Weixiang Zhao
Jiahe Guo
Yulin Hu
Yang Deng
An Zhang
...
Xinyang Han
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
AAML
LLMSV
39
0
0
13 Apr 2025
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Mutahar Ali
Arjun Arunasalam
Habiba Farrukh
SILM
46
0
0
09 Apr 2025
Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering
Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering
Hamed Jelodar
Samita Bai
Parisa Hamedi
Hesamodin Mohammadian
R. Razavi-Far
Ali Ghorbani
34
0
0
07 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
19
0
0
07 Apr 2025
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Yifan Wang
Runjin Chen
Bolian Li
David Cho
Yihe Deng
Ruqi Zhang
Tianlong Chen
Zhangyang Wang
A. Grama
Junyuan Hong
SyDa
48
0
0
03 Apr 2025
PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization
PiCo: Jailbreaking Multimodal Large Language Models via Pi\textbf{Pi}Pictorial Co\textbf{Co}Code Contextualization
Aofan Liu
Lulu Tang
Ting Pan
Yuguo Yin
Bin Wang
Ao Yang
MLLM
AAML
40
0
0
02 Apr 2025
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
Shayne Longpre
Kevin Klyman
Ruth E. Appel
Sayash Kapoor
Rishi Bommasani
...
Victoria Westerhoff
Yacine Jernite
Rumman Chowdhury
Percy Liang
Arvind Narayanan
ELM
40
0
0
21 Mar 2025
Deep Contrastive Unlearning for Language Models
Deep Contrastive Unlearning for Language Models
Estrid He
Tabinda Sarwar
Ibrahim Khalil
X. Yi
Ke Wang
MU
51
0
0
19 Mar 2025
Can Language Models Follow Multiple Turns of Entangled Instructions?
Can Language Models Follow Multiple Turns of Entangled Instructions?
Chi Han
ELM
LRM
40
1
0
17 Mar 2025
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
Wenlong Meng
Fan Zhang
Wendao Yao
Zhenyuan Guo
Y. Li
Chengkun Wei
Wenzhi Chen
AAML
36
1
0
11 Mar 2025
AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management
Junyuan Mao
Fanci Meng
Yifan Duan
Miao Yu
X. Jia
Junfeng Fang
Yuxuan Liang
K. Wang
Qingsong Wen
LLMAG
AAML
39
1
0
06 Mar 2025
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
Shiyu Xiang
Ansen Zhang
Yanfei Cao
Yang Fan
Ronghao Chen
AAML
60
0
0
26 Feb 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
45
0
0
24 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
J. Liu
41
0
0
23 Feb 2025
Towards User-level Private Reinforcement Learning with Human Feedback
Towards User-level Private Reinforcement Learning with Human Feedback
J. Zhang
Mingxi Lei
Meng Ding
Mengdi Li
Zihang Xiang
Difei Xu
Jinhui Xu
Di Wang
36
0
0
22 Feb 2025
Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models
Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models
Qingsong Zou
Jingyu Xiao
Qing Li
Zhi Yan
Y. Wang
Li Xu
Wenxuan Wang
Kuofeng Gao
Ruoyu Li
Yong-jia Jiang
AAML
81
0
0
21 Feb 2025
Revealing and Mitigating Over-Attention in Knowledge Editing
Revealing and Mitigating Over-Attention in Knowledge Editing
Pinzheng Wang
Zecheng Tang
Keyan Zhou
J. Li
Qiaoming Zhu
M. Zhang
KELM
115
2
0
21 Feb 2025
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
Yang Yao
Xuan Tong
Ruofan Wang
Yixu Wang
Lujundong Li
Liang Liu
Yan Teng
Y. Wang
LRM
43
2
0
19 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
36
0
0
05 Feb 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models
Peering Behind the Shield: Guardrail Identification in Large Language Models
Ziqing Yang
Yixin Wu
Rui Wen
Michael Backes
Yang Zhang
53
1
0
03 Feb 2025
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
Y. Gong
Zhuo Chen
Miaokun Chen
Fengchang Yu
Wei-Tsung Lu
XiaoFeng Wang
Xiaozhong Liu
J. Liu
AAML
SILM
56
0
0
03 Feb 2025
FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models
FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models
Zhuo Chen
Y. Gong
Miaokun Chen
Haotan Liu
Qikai Cheng
Fan Zhang
Wei-Tsung Lu
Xiaozhong Liu
J. Liu
XiaoFeng Wang
AAML
37
1
0
06 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
59
17
0
31 Dec 2024
Prompt-based Unifying Inference Attack on Graph Neural Networks
Prompt-based Unifying Inference Attack on Graph Neural Networks
Yuecen Wei
Xingcheng Fu
Lingyun Liu
Qingyun Sun
Hao Peng
Chunming Hu
AAML
74
0
0
20 Dec 2024
JailPO: A Novel Black-box Jailbreak Framework via Preference
  Optimization against Aligned LLMs
JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
H. Li
Jiawei Ye
Jie Wu
Tianjie Yan
Chu Wang
Zhixin Li
AAML
67
0
0
20 Dec 2024
Towards Action Hijacking of Large Language Model-based Agent
Towards Action Hijacking of Large Language Model-based Agent
Yuyang Zhang
Kangjie Chen
Xudong Jiang
Yuxiang Sun
Run Wang
Lina Wang
LLMAG
AAML
73
2
0
14 Dec 2024
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented
  Generation Applications with Agent-based Attacks
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks
Changyue Jiang
Xudong Pan
Geng Hong
Chenfu Bao
Min Yang
SILM
69
7
0
21 Nov 2024
AIDBench: A benchmark for evaluating the authorship identification
  capability of large language models
AIDBench: A benchmark for evaluating the authorship identification capability of large language models
Zichen Wen
Dadi Guo
Huishuai Zhang
67
0
0
20 Nov 2024
Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors
Svetlana Seliunina
Artem Otelepko
Raphael Memmesheimer
Sven Behnke
28
0
0
17 Nov 2024
Jailbreak Attacks and Defenses against Multimodal Generative Models: A
  Survey
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Xuannan Liu
Xing Cui
Peipei Li
Zekun Li
Huaibo Huang
Shuhan Xia
Miaoxuan Zhang
Yueying Zou
Ran He
AAML
58
6
0
14 Nov 2024
New Emerged Security and Privacy of Pre-trained Model: a Survey and
  Outlook
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook
Meng Yang
Tianqing Zhu
Chi Liu
Wanlei Zhou
Shui Yu
Philip S. Yu
AAML
ELM
PILM
48
1
0
12 Nov 2024
Diversity Helps Jailbreak Large Language Models
Diversity Helps Jailbreak Large Language Models
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
57
0
0
06 Nov 2024
ProTransformer: Robustify Transformers via Plug-and-Play Paradigm
ProTransformer: Robustify Transformers via Plug-and-Play Paradigm
Zhichao Hou
Weizhi Gao
Yuchen Shen
Feiyi Wang
Xiaorui Liu
VLM
21
2
0
30 Oct 2024
CLEAR: Towards Contextual LLM-Empowered Privacy Policy Analysis and Risk
  Generation for Large Language Model Applications
CLEAR: Towards Contextual LLM-Empowered Privacy Policy Analysis and Risk Generation for Large Language Model Applications
Chaoran Chen
Daodao Zhou
Yanfang Ye
Toby Jia-jun Li
Yaxing Yao
AILaw
28
3
0
17 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Z. Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
50
5
0
17 Oct 2024
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning
Jared Joselowitz
Arjun Jagota
Satyapriya Krishna
Sonali Parbhoo
Nyal Patel
Satyapriya Krishna
Sonali Parbhoo
19
0
0
16 Oct 2024
Cognitive Overload Attack:Prompt Injection for Long Context
Cognitive Overload Attack:Prompt Injection for Long Context
Bibek Upadhayay
Vahid Behzadan
Amin Karbasi
AAML
28
2
0
15 Oct 2024
DAWN: Designing Distributed Agents in a Worldwide Network
DAWN: Designing Distributed Agents in a Worldwide Network
Zahra Aminiranjbar
Jianan Tang
Qiudan Wang
Shubha Pant
Mahesh Viswanathan
LLMAG
AI4CE
23
1
0
11 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
66
1
0
09 Oct 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Tianyu Wu
Lingrui Mei
Ruibin Yuan
Lujun Li
Wei Xue
Yike Guo
33
1
0
04 Oct 2024
Permissive Information-Flow Analysis for Large Language Models
Permissive Information-Flow Analysis for Large Language Models
Shoaib Ahmed Siddiqui
Radhika Gaonkar
Boris Köpf
David M. Krueger
Andrew J. Paverd
Ahmed Salem
Shruti Tople
Lukas Wutschitz
Menglin Xia
Santiago Zanella Béguelin
18
1
0
04 Oct 2024
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending
  Against Prompt Injection Attacks
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks
Rongchang Li
Minjie Chen
Chang Hu
Han Chen
Wenpeng Xing
Meng Han
SILM
ELM
26
1
0
29 Sep 2024
From Deception to Detection: The Dual Roles of Large Language Models in
  Fake News
From Deception to Detection: The Dual Roles of Large Language Models in Fake News
Dorsaf Sallami
Yuan-Chen Chang
Esma Aïmeur
26
3
0
25 Sep 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
60
3
0
23 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
42
0
0
19 Sep 2024
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey
Yujia Zhou
Yan Liu
Xiaoxi Li
Jiajie Jin
Hongjin Qian
Zheng Liu
Chaozhuo Li
Zhicheng Dou
Tsung-Yi Ho
Philip S. Yu
3DV
RALM
43
22
0
16 Sep 2024
12345
Next