Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02630
Cited By
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
4 June 2024
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways"
24 / 24 papers shown
Title
From Glue-Code to Protocols: A Critical Analysis of A2A and MCP Integration for Scalable Agent Systems
Qiaomu Li
Ying Xie
24
0
0
06 May 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
54
0
0
25 Apr 2025
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
47
113
0
19 Apr 2024
What Was Your Prompt? A Remote Keylogging Attack on AI Assistants
Roy Weiss
Daniel Ayzenshteyn
Guy Amit
Yisroel Mirsky
46
12
0
14 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
42
72
0
05 Mar 2024
WIPI: A New Web Threat for LLM-Driven Web Agents
Fangzhou Wu
Shutong Wu
Yulong Cao
Chaowei Xiao
LLMAG
32
17
0
26 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
The Future of Cognitive Strategy-enhanced Persuasive Dialogue Agents: New Perspectives and Trends
Mengqi Chen
Bin Guo
Hao Wang
Haoyu Li
Qian Zhao
Jingqi Liu
Yasan Ding
Yan Pan
Zhiwen Yu
LLMAG
25
1
0
07 Feb 2024
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
Mintong Kang
Nezihe Merve Gürel
Ning Yu
D. Song
Bo-wen Li
76
20
0
05 Feb 2024
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
Zelong Li
Wenyue Hua
Hao Wang
He Zhu
Yongfeng Zhang
LLMAG
59
18
0
01 Feb 2024
Combating Adversarial Attacks with Multi-Agent Debate
Steffi Chern
Zhen Fan
Andy Liu
AAML
24
5
0
11 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
50
22
0
11 Jan 2024
Detection and Defense Against Prominent Attacks on Preconditioned LLM-Integrated Virtual Assistants
C. Chan
Daniel Wankit Yip
Aysan Esmradi
14
1
0
02 Jan 2024
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
37
14
0
22 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
207
178
0
20 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
138
139
0
16 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
Haoran Li
Mingshi Xu
Yangqiu Song
57
26
0
04 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
206
1,701
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Toxicity Detection with Generative Prompt-based Inference
Yau-Shian Wang
Y. Chang
69
34
0
24 May 2022
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
Tianlong Chen
Zhenyu (Allen) Zhang
Yihua Zhang
Shiyu Chang
Sijia Liu
Zhangyang Wang
AAML
36
25
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
236
191
0
15 Sep 2021
1