Breaking Down the Defenses: A Comparative Survey of Attacks on Large
Language Models

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models

3 March 2024

Arijit Ghosh Chowdhury

Md. Mofijul Islam

Vinija Jain

Papers citing "Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models"

6 / 6 papers shown

Title
Single-pass Detection of Jailbreaking Input in Large Language Models Leyla Naz Candogan Yongtao Wu Elias Abad Rocamora Grigorios G. Chrysos V. Cevher AAML 45 0 0 24 Feb 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks Ang Li Yin Zhou Vethavikashini Chithrra Raghuram Tom Goldstein Micah Goldblum AAML 66 7 0 12 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities Zora Che Stephen Casper Robert Kirk Anirudh Satheesh Stewart Slocum ... Zikui Cai Bilal Chughtai Y. Gal Furong Huang Dylan Hadfield-Menell MU AAML ELM 78 2 0 03 Feb 2025
Recent Advances in Attack and Defense Approaches of Large Language Models Jing Cui Yishi Xu Zhewei Huang Shuchang Zhou Jianbin Jiao Junge Zhang PILM AAML 47 1 0 05 Sep 2024
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield Jinhwa Kim Ali Derakhshan Ian G. Harris AAML 72 16 0 31 Oct 2023
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models Shuai Zhao Jinming Wen Anh Tuan Luu J. Zhao Jie Fu SILM 57 88 0 02 May 2023