Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

16 October 2023

Papers citing "Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks"

20 / 20 papers shown

Title
Attack and defense techniques in large language models: A survey and new perspectives Zhiyu Liao Kang Chen Yuanguo Lin Kangkang Li Yunxuan Liu Hefeng Chen Xingwang Huang Yuanhui Yu AAML 21 0 0 02 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures Francisco Aguilera-Martínez Fernando Berzal PILM 4 0 0 02 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit Jinsheng Pan Xiaogeng Liu Chaowei Xiao AAML 41 0 0 01 May 2025
Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation Bikash Saha Nanda Rani Sandeep K. Shukla 31 0 0 30 Apr 2025
Towards Building a Robust Toxicity Predictor Dmitriy Bespalov Sourav S. Bhabesh Yi Xiang Liutong Zhou Yanjun Qi AAML 77 6 0 09 Apr 2024
On the Adversarial Robustness of Multi-Modal Foundation Models Christian Schlarmann Matthias Hein AAML 55 45 0 21 Aug 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 204 413 0 28 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action Dhruv Shah B. Osinski Brian Ichter Sergey Levine LM&Ro 127 308 0 10 Jul 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 271 2,712 0 24 May 2022
Teaching language models to support answers with verified quotes Jacob Menick Maja Trebacz Vladimir Mikulik John Aslanides Francis Song ... Mia Glaese Susannah Young Lucy Campbell-Gillingham G. Irving Nat McAleese ELM RALM 197 204 0 21 Mar 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 273 8,441 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 276 5,177 0 28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 192 1,436 0 15 Oct 2021
Challenges in Detoxifying Language Models Johannes Welbl Amelia Glaese J. Uesato Sumanth Dathathri John F. J. Mellor Lisa Anne Hendricks Kirsty Anderson Pushmeet Kohli Ben Coppin Po-Sen Huang LM&MA 210 156 0 15 Sep 2021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models Alex Tamkin Miles Brundage Jack Clark Deep Ganguli AILaw ELM 159 198 0 04 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 202 1,508 0 31 Dec 2020
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 243 1,386 0 14 Dec 2020
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 270 3,347 0 23 Aug 2019
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks Mohit Iyyer John Wieting Kevin Gimpel Luke Zettlemoyer AAML GAN 154 655 0 17 Apr 2018
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks Guy Katz Clark W. Barrett D. Dill Kyle D. Julian Mykel Kochenderfer AAML 192 1,714 0 03 Feb 2017