Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.12173
Cited By
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
23 February 2023
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"
50 / 289 papers shown
Title
Immunization against harmful fine-tuning attacks
Domenic Rosati
Jan Wehner
Kai Williams
Lukasz Bartoszcze
Jan Batzner
Hassan Sajjad
Frank Rudzicz
AAML
54
16
0
26 Feb 2024
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings
Hao Wang
Hao Li
Minlie Huang
Lei Sha
AAML
32
12
0
25 Feb 2024
A Conversational Brain-Artificial Intelligence Interface
Anja Meunier
Michal Robert Zák
Lucas Munz
Sofiya Garkot
Manuel Eder
Jiachen Xu
Moritz Grosse-Wentrup
23
0
0
22 Feb 2024
Coercing LLMs to do and reveal (almost) anything
Jonas Geiping
Alex Stein
Manli Shu
Khalid Saifullah
Yuxin Wen
Tom Goldstein
AAML
32
43
0
21 Feb 2024
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David A. Wagner
AAML
SILM
56
7
0
20 Feb 2024
Query-Based Adversarial Prompt Generation
Jonathan Hayase
Ema Borevkovic
Nicholas Carlini
Florian Tramèr
Milad Nasr
AAML
SILM
43
25
0
19 Feb 2024
SPML: A DSL for Defending Language Models Against Prompt Attacks
Reshabh K Sharma
Vinayak Gupta
Dan Grossman
AAML
49
14
0
19 Feb 2024
Proving membership in LLM pretraining data via data watermarks
Johnny Tian-Zheng Wei
Ryan Yixiang Wang
Robin Jia
WaLM
16
22
0
16 Feb 2024
A StrongREJECT for Empty Jailbreaks
Alexandra Souly
Qingyuan Lu
Dillon Bowen
Tu Trinh
Elvis Hsieh
...
Pieter Abbeel
Justin Svegliato
Scott Emmons
Olivia Watkins
Sam Toyer
25
64
0
15 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
41
14
0
15 Feb 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALM
ELM
62
50
0
15 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David A. Wagner
Alexandre Araujo
ELM
19
29
0
15 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
52
55
0
14 Feb 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xing-ming Guo
Fangxu Yu
Huan Zhang
Lianhui Qin
Bin Hu
AAML
114
69
0
13 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
Wei Zou
Runpeng Geng
Binghui Wang
Jinyuan Jia
SILM
28
45
1
12 Feb 2024
Whispers in the Machine: Confidentiality in LLM-integrated Systems
Jonathan Evertz
Merlin Chlosta
Lea Schonherr
Thorsten Eisenhofer
69
16
0
10 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David A. Wagner
SILM
AAML
22
65
0
09 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
33
65
0
08 Feb 2024
(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice
Inyoung Cheong
King Xia
K. J. Kevin Feng
Quan Ze Chen
Amy X. Zhang
AILaw
ELM
33
57
0
02 Feb 2024
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions
Pouya Pezeshkpour
Eser Kandogan
Nikita Bhutani
Sajjadur Rahman
Tom Mitchell
Estevam R. Hruschka
LLMAG
LRM
28
7
0
02 Feb 2024
An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi
Alisia Marianne Michel
R. Mukkamala
J. Thatcher
SILM
AAML
19
16
0
31 Jan 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
30
66
0
29 Jan 2024
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
AAML
SILM
15
26
0
15 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
55
56
0
11 Jan 2024
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services
Zilong Lin
Jian Cui
Xiaojing Liao
XiaoFeng Wang
27
19
0
06 Jan 2024
A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models
Daniel Wankit Yip
Aysan Esmradi
C. Chan
AAML
23
11
0
02 Jan 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet
Maha Alrashed
Chawin Sitawarin
Sizhe Chen
Zeming Wei
Elizabeth Sun
Basel Alomair
David A. Wagner
AAML
SyDa
75
51
0
29 Dec 2023
Towards Auto-Modeling of Formal Verification for NextG Protocols: A Multimodal cross- and self-attention Large Language Model Approach
Jing-Bing Yang
Ying Wang
25
5
0
28 Dec 2023
Exploiting Novel GPT-4 APIs
Kellin Pelrine
Mohammad Taufeeque
Michal Zajkac
Euan McLean
Adam Gleave
SILM
18
20
0
21 Dec 2023
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Madeleine Grunde-McLaughlin
Michelle S. Lam
Ranjay Krishna
Daniel S. Weld
Jeffrey Heer
AI4CE
45
21
0
18 Dec 2023
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
25
11
0
18 Dec 2023
Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak
Yanrui Du
Sendong Zhao
Ming Ma
Yuhan Chen
Bing Qin
26
15
0
07 Dec 2023
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
Matteo Gioele Collu
Tom Janssen-Groesbeek
Stefanos Koffas
Mauro Conti
S. Picek
14
1
0
06 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
30
201
0
04 Dec 2023
Intrusion Detection System with Machine Learning and Multiple Datasets
Haiyan Xuan
Mohith Manohar
AAML
6
0
0
04 Dec 2023
Assessing Prompt Injection Risks in 200+ Custom GPTs
Jiahao Yu
Yuhang Wu
Dong Shu
Mingyu Jin
Sabrina Yang
Xinyu Xing
17
51
0
20 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
29
27
0
15 Nov 2023
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective
Zi Yin
Wei Ding
Jia Liu
25
1
0
14 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
Suyu Ge
Chunting Zhou
Rui Hou
Madian Khabsa
Yi-Chia Wang
Qifan Wang
Jiawei Han
Yuning Mao
AAML
LRM
11
93
0
13 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
129
117
0
09 Nov 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
Rusheb Shah
Quentin Feuillade--Montixi
Soroush Pour
Arush Tagade
Stephen Casper
Javier Rando
21
122
0
06 Nov 2023
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David A. Wagner
ALM
18
26
0
06 Nov 2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
...
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
11
71
0
02 Nov 2023
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
AAML
86
16
0
31 Oct 2023
From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude
S. Roy
Poojitha Thota
Krishna Vamsi Naragam
Shirin Nilizadeh
SILM
41
15
0
29 Oct 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
Sander Schulhoff
Jeremy Pinto
Anaum Khan
Louis-Franccois Bouchard
Chenglei Si
Svetlina Anati
Valen Tagliabue
Anson Liu Kost
Christopher Carnahan
Jordan L. Boyd-Graber
SILM
29
41
0
24 Oct 2023
Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers
Mosh Levy
Shauli Ravfogel
Yoav Goldberg
33
5
0
24 Oct 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu
Ruiyi Zhang
Bang An
Gang Wu
Joe Barrow
Zichao Wang
Furong Huang
A. Nenkova
Tong Sun
SILM
AAML
30
40
0
23 Oct 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILM
LLMAG
16
58
0
19 Oct 2023
Previous
1
2
3
4
5
6
Next