Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

17 August 2023

Papers citing "Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection"

21 / 21 papers shown

Title
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction Y. Chen Haoran Li Yuan Sui Y. Liu Yufei He Y. Song Bryan Hooi AAML SILM 61 0 0 29 Apr 2025
Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks Antonios Tragoudaras Theofanis Aslanidis Emmanouil Georgios Lionis Marina Orozco González Panagiotis Eustratiadis MIACV SILM 51 0 0 23 Apr 2025
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization Yang Jiao X. Wang Kai Yang AAML SILM 31 0 0 10 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators Xitao Li H. Wang Jiang Wu Ting Liu AAML 26 0 0 08 Apr 2025
How does Watermarking Affect Visual Language Models in Document Understanding? Chunxue Xu Yiwei Wang Bryan Hooi Yujun Cai Songze Li VLM 44 0 0 01 Apr 2025
DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective Dengyun Peng Yuhang Zhou Qiguang Chen Jinhao Liu Jingjing Chen L. Qin 50 0 0 17 Mar 2025
Can Indirect Prompt Injection Attacks Be Detected and Removed? Yulin Chen Haoran Li Yuan Sui Yufei He Yue Liu Y. Song Bryan Hooi AAML 40 3 0 23 Feb 2025
Defense Against Prompt Injection Attack by Leveraging Attack Techniques Yulin Chen Haoran Li Zihao Zheng Y. Song Dekai Wu Bryan Hooi SILM AAML 47 4 0 01 Nov 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning Tingchen Fu Mrinank Sharma Philip H. S. Torr Shay B. Cohen David M. Krueger Fazl Barez AAML 34 0 0 11 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations Tarun Raheja Nilay Pochhi AAML 46 1 0 09 Oct 2024
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models Lipeng Ma Weidong Yang Sihang Jiang Ben Fei Mingjie Zhou Shuhao Li Bo Xu Bo Xu Yanghua Xiao 49 0 0 03 Sep 2024
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions Xinbei Ma Yiting Wang Yao Yao Tongxin Yuan Aston Zhang Zhuosheng Zhang Hai Zhao AAML LLMAG 22 1 0 05 Aug 2024
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists Sumanth Doddapaneni Mohammed Safi Ur Rahman Khan Sshubam Verma Mitesh Khapra 34 11 0 19 Jun 2024
Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications Junlin Wang Tianyi Yang Roy Xie Bhuwan Dhingra SILM AAML 29 3 0 10 Jun 2024
An Early Categorization of Prompt Injection Attacks on Large Language Models Sippo Rossi Alisia Marianne Michel R. Mukkamala J. Thatcher SILM AAML 11 16 0 31 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly Yifan Yao Jinhao Duan Kaidi Xu Yuanfang Cai Eric Sun Yue Zhang PILM ELM 24 463 0 04 Dec 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities Lingbo Mo Boshi Wang Muhao Chen Huan Sun 9 25 0 15 Nov 2023
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks Po-Nien Kung Fan Yin Di Wu Kai-Wei Chang Nanyun Peng 56 23 0 01 Nov 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions Haoran Li Yulin Chen Jinglong Luo Yan Kang Xiaojin Zhang Qi Hu Chunkit Chan Yangqiu Song PILM 38 39 0 16 Oct 2023
Instruction Tuning with GPT-4 Baolin Peng Chunyuan Li Pengcheng He Michel Galley Jianfeng Gao SyDa ALM LM&MA 157 576 0 06 Apr 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022