ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.17686
  4. Cited By
Large Language Model Safety: A Holistic Survey

Large Language Model Safety: A Holistic Survey

23 December 2024
Dan Shi
Shangda Wu
Yufei Huang
Zhigen Li
Yongqi Leng
Renren Jin
Chuang Liu
Xinwei Wu
Zishan Guo
Linhao Yu
Ling Shi
Bojian Jiang
Deyi Xiong
    ELMLM&MA
ArXiv (abs)PDFHTMLGithub (24★)

Papers citing "Large Language Model Safety: A Holistic Survey"

20 / 20 papers shown
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
Tsimur Hadeliya
Mohammad Ali Jauhar
Nidhi Sakpal
Diogo Cruz
LLMAG
199
1
0
02 Dec 2025
SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
Ruiheng Liu
XiaoBing Chen
Jinyu Zhang
Qiongwen Zhang
Yu Zhang
Bailong Yang
163
0
0
10 Nov 2025
LLM Unlearning with LLM Beliefs
LLM Unlearning with LLM Beliefs
Kemou Li
Qizhou Wang
Y. Wang
Fengpeng Li
Jun Liu
Bo Han
Jiantao Zhou
MUKELM
209
1
0
22 Oct 2025
QGraphLIME - Explaining Quantum Graph Neural Networks
QGraphLIME - Explaining Quantum Graph Neural Networks
Haribandhu Jena
Jyotirmaya Shivottam
Subhankar Mishra
FAtt
265
2
0
07 Oct 2025
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Cheng Wang
Zeming Wei
Qin Liu
Muhao Chen
AAML
190
2
0
04 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang
Yongcan Yu
Jian Liang
Ran He
HILMLRM
205
5
0
04 Sep 2025
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang
Daoyuan Wu
Yufan Chen
ELM
120
10
0
17 Aug 2025
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Kimberly Le Truong
Riccardo Fogliato
Hoda Heidari
Zhiwei Steven Wu
236
3
0
29 Jul 2025
Are Bias Evaluation Methods Biased ?
Are Bias Evaluation Methods Biased ?
Lina Berrayana
Sean Rooney
Luis Garces-Erice
Ioana Giurgiu
ELM
214
3
0
20 Jun 2025
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Hui Wei
Dong Yoon Lee
Shubham Rohal
Zhizhang Hu
Ryan Rossi
Shiwei Fang
Shijia Pan
329
3
0
13 Jun 2025
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Leheng Sheng
Changshuo Shen
Weixiang Zhao
Junfeng Fang
Xiaohao Liu
Zhenkai Liang
Xiang Wang
An Zhang
Tat-Seng Chua
LLMSV
154
7
0
08 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MAAI4CE
273
3
0
05 Jun 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRMLLMSV
732
8
0
20 May 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Jiabo He
Yingchun Wang
Yu-Gang Jiang
406
3
0
17 May 2025
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Ada Chen
Yongjiang Wu
Jing Zhang
Shu Yang
Shu Yang
Jen-tse Huang
Wenxuan Wang
Wenxuan Wang
S. Wang
ELM
441
11
0
16 May 2025
Safety in Large Reasoning Models: A Survey
Safety in Large Reasoning Models: A Survey
Cheng Wang
Wenshu Fan
Yangqiu Song
Duzhen Zhang
Hao Sun
...
Shengju Yu
Xinfeng Li
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
LRM
1.0K
50
0
24 Apr 2025
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Thilo Hagendorff
Sarah Fabi
ReLMELMLRM
181
1
0
14 Apr 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
1.0K
5
0
03 Mar 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
706
11
0
07 Feb 2025
CollabLLM: From Passive Responders to Active Collaborators
CollabLLM: From Passive Responders to Active Collaborators
Shirley Wu
Michel Galley
Baolin Peng
Hao Cheng
Gavin Li
Yao Dou
Weixin Cai
James Zou
J. Leskovec
Jianfeng Gao
427
0
0
02 Feb 2025
1
Page 1 of 1