Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.05197
Cited By
Multi-step Jailbreaking Privacy Attacks on ChatGPT
11 April 2023
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-step Jailbreaking Privacy Attacks on ChatGPT"
50 / 235 papers shown
Title
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
47
1
0
05 Sep 2024
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
Alan Aqrawi
Arian Abbasi
AAML
21
2
0
04 Sep 2024
TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models
Zelin Li
Kehai Chen
Lemao Liu
Xuefeng Bai
Mingming Yang
Yang Xiang
Min Zhang
AAML
25
0
0
26 Aug 2024
LLM-PBE: Assessing Data Privacy in Large Language Models
Qinbin Li
Junyuan Hong
Chulin Xie
Jeffrey Tan
Rachel Xin
...
Dan Hendrycks
Zhangyang Wang
Bo Li
Bingsheng He
Dawn Song
ELM
PILM
28
12
0
23 Aug 2024
Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions
Jinxin Liu
Zao Yang
21
1
0
20 Aug 2024
Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory
Haoran Li
Wei Fan
Yulin Chen
Jiayang Cheng
Tianshu Chu
Xuebing Zhou
Peizhao Hu
Yangqiu Song
AILaw
37
2
0
19 Aug 2024
WPN: An Unlearning Method Based on N-pair Contrastive Learning in Language Models
Guitao Chen
Yunshen Wang
Hongye Sun
Guang Chen
MU
19
1
0
18 Aug 2024
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen
Yi Liu
Dongxia Wang
Jiaying Chen
Wenhai Wang
44
1
0
18 Aug 2024
Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks
Jiawei Zhao
Kejiang Chen
Xiaojian Yuan
Weiming Zhang
AAML
26
2
0
15 Aug 2024
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Hui Li
AAML
19
11
0
08 Aug 2024
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
Anqi Zhang
Chaofeng Wu
28
4
0
30 Jul 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
40
12
0
30 Jul 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
29
7
0
30 Jul 2024
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He
Tianqing Zhu
Dayong Ye
Bo Liu
Wanlei Zhou
Philip S. Yu
PILM
LLMAG
ELM
68
22
0
28 Jul 2024
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Huiyu Xu
Wenhui Zhang
Zhibo Wang
Feng Xiao
Rui Zheng
Yunhe Feng
Zhongjie Ba
Kui Ren
AAML
LLMAG
26
11
0
23 Jul 2024
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
Shi Lin
Rongchang Li
Xun Wang
Changting Lin
Xun Wang
Wenpeng Xing
Meng Han
Meng Han
53
3
0
23 Jul 2024
Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
Zhuo Chen
Jiawei Liu
Haotan Liu
Qikai Cheng
Fan Zhang
Wei Lu
Xiaozhong Liu
AAML
18
4
0
18 Jul 2024
Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models
Dong Shu
Mingyu Jin
Tianle Chen
Chong Zhang
Yongfeng Zhang
ELM
SILM
18
1
0
12 Jul 2024
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi
Jaechan Lee
Yangsibo Huang
Sadhika Malladi
Jieyu Zhao
Ari Holtzman
Daogao Liu
Luke Zettlemoyer
Noah A. Smith
Chiyuan Zhang
MU
ELM
40
44
0
08 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
34
12
0
06 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
34
77
0
05 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
18
3
0
03 Jul 2024
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Zhexin Zhang
Junxiao Yang
Pei Ke
Shiyao Cui
Chujie Zheng
Hongning Wang
Minlie Huang
AAML
MU
29
24
0
03 Jul 2024
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
62
20
0
02 Jul 2024
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything
Xiaotian Zou
Ke Li
Yongkang Chen
MLLM
29
2
0
01 Jul 2024
Poisoned LangChain: Jailbreak LLMs by LangChain
Ziqiu Wang
Jun Liu
Shengkai Zhang
Yang Yang
20
7
0
26 Jun 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
36
26
0
26 Jun 2024
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
37
6
0
21 Jun 2024
The Fire Thief Is Also the Keeper: Balancing Usability and Privacy in Prompts
Zhili Shen
Zihang Xi
Ying He
Wei Tong
Jingyu Hua
Sheng Zhong
SILM
29
7
0
20 Jun 2024
Prompt Injection Attacks in Defended Systems
Daniil Khomsky
Narek Maloyan
Bulat Nutfullin
AAML
SILM
17
3
0
20 Jun 2024
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications
Stephen Burabari Tete
29
6
0
16 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
53
4
0
13 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
52
16
0
12 Jun 2024
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen
Raveen Wijewickrama
A. H. M. N. Sakib
Anindya Maiti
Murtuza Jadliwala
Murtuza Jadliwala
35
10
0
12 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
37
9
0
11 Jun 2024
SecureNet: A Comparative Study of DeBERTa and Large Language Models for Phishing Detection
Sakshi Mahendru
Tejul Pandit
20
1
0
10 Jun 2024
Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis
Priyanka Nanayakkara
Hyeok Kim
Yifan Wu
Ali Sarvghad
Narges Mahyar
G. Miklau
Jessica Hullman
26
17
0
04 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
42
19
0
04 Jun 2024
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Haibo Jin
Andy Zhou
Joe D. Menke
Haohan Wang
38
10
0
30 May 2024
Voice Jailbreak Attacks Against GPT-4o
Xinyue Shen
Yixin Wu
Michael Backes
Yang Zhang
AuLLM
26
9
0
29 May 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
19
9
0
28 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
18
9
0
24 May 2024
Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data
Haoran Li
Xinyuan Zhao
Dadi Guo
Hanlin Gu
Ziqian Zeng
Yuxing Han
Yangqiu Song
Lixin Fan
Qiang Yang
21
1
0
23 May 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions
Junfeng Jiao
S. Afroogh
Yiming Xu
Connor Phillips
AILaw
55
19
0
14 May 2024
Risks of Practicing Large Language Models in Smart Grid: Threat Modeling and Validation
Jiangnan Li
Yingyuan Yang
Jinyuan Stella Sun
54
3
0
10 May 2024
How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability
Jorge García-Carrasco
Alejandro Maté
Juan Trujillo
22
8
0
07 May 2024
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
Shang Shang
Xinqiang Zhao
Zhongjiang Yao
Yepeng Yao
Liya Su
Zijing Fan
Xiaodan Zhang
Zhengwei Jiang
47
3
0
06 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
24
36
0
06 May 2024
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
50
72
0
28 Apr 2024
Online Personalizing White-box LLMs Generation with Neural Bandits
Zekai Chen
Weeden Daniel
Po-yu Chen
Francois Buet-Golfouse
36
2
0
24 Apr 2024
Previous
1
2
3
4
5
Next