Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.19737
Cited By
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
30 October 2023
Leo Schwinn
David Dobre
Stephan Günnemann
Gauthier Gidel
AAML
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Adversarial Attacks and Defenses in Large Language Models: Old and New Threats"
13 / 13 papers shown
Title
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
50
0
0
02 May 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
61
0
0
25 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
56
0
0
01 Apr 2025
Fast Proxies for LLM Robustness Evaluation
Tim Beyer
Jan Schuchardt
Leo Schwinn
Stephan Günnemann
AAML
39
0
0
14 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
83
3
0
03 Feb 2025
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten
Stephan Günnemann
Leo Schwinn
MU
55
6
0
04 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
56
10
0
30 Sep 2024
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness
Xiaojing Fan
Chunliang Tao
AAML
37
28
0
08 Aug 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
47
36
0
14 Feb 2024
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
42
49
0
14 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
76
0
25 Jan 2024
Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification
Leo Schwinn
Leon Bungert
A. Nguyen
René Raab
Falk Pulsmeyer
Doina Precup
Björn Eskofier
Dario Zanca
OOD
42
12
0
19 May 2022
1