Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2412.17686
Cited By

Large Language Model Safety: A Holistic Survey

Large Language Model Safety: A Holistic Survey

23 December 2024

ArXiv (abs)PDF HTML Github (24★)

Papers citing "Large Language Model Safety: A Holistic Survey"

20 / 20 papers shown

When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents

When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents

Tsimur Hadeliya

Mohammad Ali Jauhar

199

1

0

02 Dec 2025

SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces

SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces

163

0

0

10 Nov 2025

LLM Unlearning with LLM Beliefs

LLM Unlearning with LLM Beliefs

209

1

0

22 Oct 2025

QGraphLIME - Explaining Quantum Graph Neural Networks

QGraphLIME - Explaining Quantum Graph Neural Networks

Haribandhu Jena

Jyotirmaya Shivottam

Subhankar Mishra

265

2

0

07 Oct 2025

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

190

2

0

04 Sep 2025

A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models

A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models

205

5

0

04 Sep 2025

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

120

10

0

17 Aug 2025

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles

Kimberly Le Truong

Riccardo Fogliato

Zhiwei Steven Wu

236

3

0

29 Jul 2025

Are Bias Evaluation Methods Biased ?

Are Bias Evaluation Methods Biased ?

Luis Garces-Erice

214

3

0

20 Jun 2025

A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis

A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis

329

3

0

13 Jun 2025

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

154

7

0

08 Jun 2025

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety

Duen Horng Chau

273

3

0

05 Jun 2025

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

732

8

0

20 May 2025

SafeVid: Toward Safety Aligned Video Large Multimodal Models

SafeVid: Toward Safety Aligned Video Large Multimodal Models

406

3

0

17 May 2025

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

441

11

0

16 May 2025

Safety in Large Reasoning Models: A Survey

Safety in Large Reasoning Models: A Survey

...

1.0K

50

0

24 Apr 2025

Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models

Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models

Thilo Hagendorff

181

1

0

14 Apr 2025

Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Alberto Purpura

Melissa Kazemi Rad

Mohammad Sorower

1.0K

5

0

03 Mar 2025

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

Chenghanyu Zhang

706

11

0

07 Feb 2025

CollabLLM: From Passive Responders to Active Collaborators

CollabLLM: From Passive Responders to Active Collaborators

427

0

0

02 Feb 2025

Page 1 of 1