Security Issues in Language Models

SILM

LLM security is the investigation of the failure modes of LLMs in use, the conditions that lead to them, and their mitigations. The failure modes include the vulnerabilities of LLM to leak sensitive information or inappropriate contents, inclusion of trojan samples on the web such that an LLM is trained on them to eventually show inappropriate or dangerous behaviours at their deployment, or various potential misuse of LLMs to cause harms and pursue illegal activities.

Neighbor communities

51015

Featured Papers

0 / 0 papers shown

All papers

50 / 985 papers shown

Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains Xiaochong Jiang Shiqi Yang Wenting Yang Yichen Liu Cheng Ji SILM 1 0 0 23 Feb 2026
Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians Kartik Chandra Max Kleiman-Weiner Jonathan Ragan-Kelley Joshua B. Tenenbaum SILM 18 0 0 22 Feb 2026
FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators Darsh Asher Farshad Dizani Joshua Kalyanapu Rosario Cammarota Aydin Aysu Samira Mirbagher Ajorpaz SILM 22 0 0 20 Feb 2026
The Vulnerability of LLM Rankers to Prompt Injection Attacks Yu Yin Shuai Wang Bevan Koopman Guido Zuccon SILM AAML 25 0 0 18 Feb 2026
Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation Xinguo Feng Zhongkui Ma Zihan Wang Alsharif Abuadbba Guangdong Bai AAML SILM 16 0 0 11 Feb 2026
The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis Peiran Wang Xinfeng Li Chong Xiang Jinghuai Zhang Ying Li Lixia Zhang Xiaofeng Wang Yuan Tian LLMAG AAML SILM 47 0 0 11 Feb 2026
Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation Zhisheng Qi Utkarsh Sahu Li Ma Haoyu Han Ryan Rossi ... Nesreen Ahmed Yushun Dong Yue Zhao Yu Zhang Yu Wang SILM 19 0 0 10 Feb 2026
Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders Zhuxin Lei Ziyuan Yang Yi Zhang AAML SILM 31 0 0 10 Feb 2026
Towards Poisoning Robustness Certification for Natural Language Generation Mihnea Ghitu Matthew Wicker AAML SILM 17 0 0 10 Feb 2026
One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning Kotekar Annapoorna Prabhu Andrew Gan Zahra Ghodsi AAML SILM 59 0 0 09 Feb 2026
Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation Shayan Ali Hassan Tao Ni Zafar Ayyub Qazi Marco Canini AAML SILM 35 0 0 08 Feb 2026
BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron Abdullah Arafat Miah Kevin Vu Yu Bi AAML SILM 26 0 0 06 Feb 2026
"Tab, Tab, Bug'': Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs Yunlong Lyu Yixuan Tang Peng Chen Tian Dong Xinyu Wang Zhiqiang Dong Hao Chen SILM AAML 71 0 0 06 Feb 2026
MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs Junhyeok Lee Han Jang Kyu Sung Choi AAML SILM LM&MA 36 0 0 06 Feb 2026
Learning to Inject: Automated Prompt Injection via Reinforcement Learning Xin Chen Jie Zhang Florian Tramer LLMAG SILM 23 0 0 05 Feb 2026
BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models Zihan Wang Hongwei Li Rui Zhang Wenbo Jiang Guowen Xu SILM 17 0 0 05 Feb 2026
Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates Ariel Fogel Omer Hofman Eilon Cohen Roman Vainshtein SILM 81 0 0 04 Feb 2026
Semantic-level Backdoor Attack against Text-to-Image Diffusion Models Tianxin Chen Wenbo Jiang Hongqiao Chen Zhirun Zheng Cheng Huang DiffM SILM 27 0 0 03 Feb 2026
Protecting Private Code in IDE Autocomplete using Differential Privacy Evgeny Grigorenko David Stanojević David Ilić Egor Bogomolov Kostadin Cvejoski SILM AAML 19 0 0 30 Jan 2026
Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs Xiang Zheng Yutao Wu Hanxun Huang Yige Li Xingjun Ma Bo Li Yu-Gang Jiang Cong Wang SILM AAML ELM 90 0 0 29 Jan 2026
A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy Pedro H. Barcha Correia Ryan W. Achjian Diego E. G. Caetano de Oliveira Ygor Acacio Maria Victor Takashi Hayashi Marcos Lopes Charles Christian Miers Marcos A. Simplicio Jr AAML SILM 26 0 0 29 Jan 2026
Adversarial Vulnerability Transcends Computational Paradigms: Feature Engineering Provides No Defense Against Neural Adversarial Transfer Achraf Hsain Ahmed Abdelkader Emmanuel Baldwin Mbaya Hamoud Aljamaan AAML SILM 36 0 0 29 Jan 2026
Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models Anmol Goel Cornelius Emde Sangdoo Yun Seong Joon Oh Martin Gubri PILM SILM 63 0 1 21 Jan 2026
PINA: Prompt Injection Attack against Navigation Agents Jiani Liu Yixin He Lanlan Fan Qidi Zhong Yushi Cheng Meng Zhang Yanjiao Chen Wenyuan Xu LLMAG AAML SILM 62 0 0 20 Jan 2026
RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models Rishit Chugh SILM AAML 80 0 0 20 Jan 2026
CODE: A Contradiction-Based Deliberation Extension Framework for Overthinking Attacks on Retrieval-Augmented Generation Xiaolei Zhang Xiaojun Jia Liquan Chen Songze Li SILM LRM 42 0 0 19 Jan 2026
SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation Aiman Al Masoud Marco Arazzi Antonino Nocera SILM 43 0 0 16 Jan 2026
Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG Haoze Guo Ziqi Wei SILM 166 0 0 16 Jan 2026
Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection Yuansen Liu Yixuan Tang Anthony Kum Hoe Tun AAML SILM LRM 130 0 0 15 Jan 2026
CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning Yuanjie Zhao Junnan Qiu Yue Ding Jie Li AAML OffRL SILM 58 0 0 15 Jan 2026
The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism Oleg Brodt Elad Feldman Bruce Schneier Ben Nassi SILM 245 0 0 14 Jan 2026
How Secure is Secure Code Generation? Adversarial Prompts Put LLM Defenses to the Test Melissa Tessa Iyiola E. Olatunji Aicha War Jacques Klein Tegawendé F. Bissyandé AAML SILM ELM 84 0 0 11 Jan 2026
Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems Hongyan Chang Ergute Bao Xinjian Luo Ting Yu SILM RALM 193 0 0 11 Jan 2026
SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems Andreea-Elena Bodea Stephen Meisenbacher Alexandra Klymenko Florian Matthes SILM 218 0 0 07 Jan 2026
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI Srija Mukhopadhyay Sathwik Reddy Shruthi Muthukumar Jisun An Ponnurangam Kumaraguru SILM 296 0 0 31 Dec 2025
Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark Manu Yi Guo Kanchana Thilakarathna Nirhoshan Sivaroopan Jo Plested Tim Lynar Jack Yang Wangli Yang SILM 76 0 0 29 Dec 2025
Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation Tian Li Bo Lin Shangwen Wang Yusong Tan SILM 72 0 0 25 Dec 2025
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected Kanchon Gharami Sanjiv Kumar Sarkar Yongxin Liu Shafika Showkat Moni SILM 289 0 0 23 Dec 2025
SoK: Understanding (New) Security Issues Across AI4Code Use Cases Qilong Wu Taoran Li Tianyang Zhou Varun Chandrasekaran SILM SyDa 305 0 0 20 Dec 2025
Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks Safwan Shaheer G.M. Refatul Islam Mohammad Rafid Hamid Tahsin Zaman Jilan AAML SILM 230 0 0 18 Dec 2025
MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval Saksham Sahai Srivastava Haoyu He SILM 120 0 0 18 Dec 2025
Detecting Prompt Injection Attacks Against Application Using Classifiers Safwan Shaheer G. M. Refatul Islam Mohammad Rafid Hamid Md. Abrar Faiaz Khan Md. Omar Faruk Yaseen Nur SILM 227 0 0 14 Dec 2025
Adversarial Robustness in Financial Machine Learning: Defenses, Economic Impact, and Governance Evidence Samruddhi Baviskar AAML SILM 142 0 0 14 Dec 2025
One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs Yixin Tan Zhe Yu Jun Sakuma AAML SILM PILM 243 0 0 14 Dec 2025
Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously Andrew Adiletta Kathryn Adiletta Kemal Derya Berk Sunar AAML SILM 198 0 0 12 Dec 2025
S3C2 SICP Summit 2025-06: Vulnerability Response Summit Anna Lena Rotthaler Simon Oberthür Juraj Somorovsky Kirsten Thommes Simon Trang ... William Enck A. Kapravelos Christian Kastner Dominik Wermke Laurie A. Williams SILM 370 0 0 02 Dec 2025
Defense That Attacks: How Robust Models Become Better Attackers Mohamed Awad Mahmoud Akrm Walid Gomaa SILM AAML 421 0 0 02 Dec 2025
Bias Injection Attacks on RAG Databases and Sanitization Defenses Hao Wu Prateek Saxena AAML SILM 354 0 0 30 Nov 2025
Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis Mintong Kang Chong Xiang Sanjay Kariyappa Chaowei Xiao Bo Li Edward Suh SILM AAML 422 0 0 30 Nov 2025
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines Yuhang Wang Yanxu Zhu Dongyuan Lu Jitao Sang AAML SILM ELM LRM 548 0 0 26 Nov 2025

Loading #Papers per Month with "SILM"

Past speakers

Name (-)

Top Contributors

Name (-)

Top Organizations at ResearchTrend.AI

Name (-)

Social Events

Date	Location	Event
No social events available