Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2412.17686
Cited By
Large Language Model Safety: A Holistic Survey
23 December 2024
Dan Shi
Shangda Wu
Yufei Huang
Zhigen Li
Yongqi Leng
Renren Jin
Chuang Liu
Xinwei Wu
Zishan Guo
Linhao Yu
Ling Shi
Bojian Jiang
Deyi Xiong
ELM
LM&MA
Re-assign community
ArXiv (abs)
PDF
HTML
Github (24★)
Papers citing
"Large Language Model Safety: A Holistic Survey"
20 / 20 papers shown
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
Tsimur Hadeliya
Mohammad Ali Jauhar
Nidhi Sakpal
Diogo Cruz
LLMAG
199
1
0
02 Dec 2025
SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
Ruiheng Liu
XiaoBing Chen
Jinyu Zhang
Qiongwen Zhang
Yu Zhang
Bailong Yang
163
0
0
10 Nov 2025
LLM Unlearning with LLM Beliefs
Kemou Li
Qizhou Wang
Y. Wang
Fengpeng Li
Jun Liu
Bo Han
Jiantao Zhou
MU
KELM
209
1
0
22 Oct 2025
QGraphLIME - Explaining Quantum Graph Neural Networks
Haribandhu Jena
Jyotirmaya Shivottam
Subhankar Mishra
FAtt
265
2
0
07 Oct 2025
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Cheng Wang
Zeming Wei
Qin Liu
Muhao Chen
AAML
190
2
0
04 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang
Yongcan Yu
Jian Liang
Ran He
HILM
LRM
205
5
0
04 Sep 2025
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang
Daoyuan Wu
Yufan Chen
ELM
120
10
0
17 Aug 2025
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Kimberly Le Truong
Riccardo Fogliato
Hoda Heidari
Zhiwei Steven Wu
236
3
0
29 Jul 2025
Are Bias Evaluation Methods Biased ?
Lina Berrayana
Sean Rooney
Luis Garces-Erice
Ioana Giurgiu
ELM
214
3
0
20 Jun 2025
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Hui Wei
Dong Yoon Lee
Shubham Rohal
Zhizhang Hu
Ryan Rossi
Shiwei Fang
Shijia Pan
329
3
0
13 Jun 2025
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Leheng Sheng
Changshuo Shen
Weixiang Zhao
Junfeng Fang
Xiaohao Liu
Zhenkai Liang
Xiang Wang
An Zhang
Tat-Seng Chua
LLMSV
154
7
0
08 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MA
AI4CE
273
3
0
05 Jun 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRM
LLMSV
732
8
0
20 May 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Jiabo He
Yingchun Wang
Yu-Gang Jiang
406
3
0
17 May 2025
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Ada Chen
Yongjiang Wu
Jing Zhang
Shu Yang
Shu Yang
Jen-tse Huang
Wenxuan Wang
Wenxuan Wang
S. Wang
ELM
441
11
0
16 May 2025
Safety in Large Reasoning Models: A Survey
Cheng Wang
Wenshu Fan
Yangqiu Song
Duzhen Zhang
Hao Sun
...
Shengju Yu
Xinfeng Li
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
LRM
1.0K
50
0
24 Apr 2025
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Thilo Hagendorff
Sarah Fabi
ReLM
ELM
LRM
181
1
0
14 Apr 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
1.0K
5
0
03 Mar 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
706
11
0
07 Feb 2025
CollabLLM: From Passive Responders to Active Collaborators
Shirley Wu
Michel Galley
Baolin Peng
Hao Cheng
Gavin Li
Yao Dou
Weixin Cai
James Zou
J. Leskovec
Jianfeng Gao
427
0
0
02 Feb 2025
1
Page 1 of 1