ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18889
  4. Cited By
Security Concerns for Large Language Models: A Survey
v1v2 (latest)

Security Concerns for Large Language Models: A Survey

24 May 2025
Miles Q. Li
Benjamin C. M. Fung
    PILMELM
ArXiv (abs)PDFHTML

Papers citing "Security Concerns for Large Language Models: A Survey"

25 / 25 papers shown
Title
Defending against Indirect Prompt Injection by Instruction Detection
Defending against Indirect Prompt Injection by Instruction Detection
Tongyu Wen
Chenglong Wang
Xiyuan Yang
Haoyu Tang
Yueqi Xie
Lingjuan Lyu
Zhicheng Dou
Fangzhao Wu
AAML
83
1
0
08 May 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
186
38
0
14 Mar 2025
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio
Michael K. Cohen
Damiano Fornasiere
J. Ghosn
Pietro Greiner
...
Jesse Richardson
Oliver E. Richardson
Marc-Antoine Rondeau
P. St-Charles
David Williams-King
108
18
0
21 Feb 2025
Emerging Security Challenges of Large Language Models
Emerging Security Challenges of Large Language Models
Herve Debar
Sven Dietrich
Pavel Laskov
Emil C. Lupu
Eirini Ntoutsi
ELM
34
2
0
23 Dec 2024
Evaluation of LLM Vulnerabilities to Being Misused for Personalized
  Disinformation Generation
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
Aneta Zugecova
Dominik Macko
Ivan Srba
Robert Moro
Jakub Kopal
Katarina Marcincinova
Matus Mesarcik
115
5
0
18 Dec 2024
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats
  in LLM-Based Agents
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents
Yuyou Gan
Yong Yang
Zhe Ma
Ping He
Rui Zeng
...
Songze Li
Ting Wang
Yunjun Gao
Yingcai Wu
Shouling Ji
PILMLLMAG
77
12
0
14 Nov 2024
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
Yige Li
Hanxun Huang
Yunhan Zhao
Xingjun Ma
Jun Sun
AAMLSILM
113
19
0
23 Aug 2024
Goal-guided Generative Prompt Injection Attack on Large Language Models
Goal-guided Generative Prompt Injection Attack on Large Language Models
Chong Zhang
Mingyu Jin
Qinkai Yu
Chengzhi Liu
Haochen Xue
Xiaobo Jin
AAMLSILM
96
16
0
06 Apr 2024
Rejection Improves Reliability: Training LLMs to Refuse Unknown
  Questions Using RL from Knowledge Feedback
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu
Zichen Zhu
Situo Zhang
Da Ma
Shuai Fan
Lu Chen
Kai Yu
HILM
105
45
0
27 Mar 2024
Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
Stav Cohen
Ron Bitton
Ben Nassi
90
24
0
05 Mar 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety
  Training
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Evan Hubinger
Carson E. Denison
Jesse Mu
Mike Lambert
Meg Tong
...
Sören Mindermann
Ryan Greenblatt
Buck Shlegeris
Nicholas Schiefer
Ethan Perez
LLMAG
97
175
0
10 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good,
  the Bad, and the Ugly
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILMELM
125
561
0
04 Dec 2023
The Philosopher's Stone: Trojaning Plugins of Large Language Models
The Philosopher's Stone: Trojaning Plugins of Large Language Models
Tian Dong
Minhui Xue
Guoxing Chen
Rayne Holland
Shaofeng Li
Yan Meng
Zhen Liu
Haojin Zhu
AAML
149
14
0
01 Dec 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILMLLMAG
133
97
0
19 Oct 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
141
364
0
19 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
241
163
0
16 Oct 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
108
159
0
18 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
308
1,528
0
27 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
507
12,128
0
18 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLMLRM
80
193
0
17 Jul 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
177
503
0
23 Feb 2023
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor
  Attacks to InstructGPT
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
Jiawen Shi
Yixin Liu
Pan Zhou
Lichao Sun
SILM
66
83
0
21 Feb 2023
Ignore Previous Prompt: Attack Techniques For Language Models
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
106
452
0
17 Nov 2022
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.1K
42,651
0
28 May 2020
Universal Adversarial Triggers for Attacking and Analyzing NLP
Universal Adversarial Triggers for Attacking and Analyzing NLP
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
AAMLSILM
120
878
0
20 Aug 2019
1