Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

30 July 2024

Papers citing "Can LLMs be Fooled? Investigating Vulnerabilities in LLMs"

4 / 4 papers shown

Title
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Mikayel Samvelyan Sharath Chandra Raparthy Andrei Lupu Eric Hambro Aram H. Markosyan ... Minqi Jiang Jack Parker-Holder Jakob Foerster Tim Rocktaschel Roberta Raileanu SyDa 68 61 0 26 Feb 2024
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks Erfan Shayegani Md Abdullah Al Mamun Yu Fu Pedram Zaree Yue Dong Nael B. Abu-Ghazaleh AAML 138 139 0 16 Oct 2023
On the Risk of Misinformation Pollution with Large Language Models Yikang Pan Liangming Pan Wenhu Chen Preslav Nakov Min-Yen Kan W. Wang DeLMO 190 105 0 23 May 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 216 327 0 23 Aug 2022