ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.13494
  4. Cited By
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical
  Gradient Analysis

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

21 February 2024
Yueqi Xie
Minghong Fang
Renjie Pi
Neil Zhenqiang Gong
ArXivPDFHTML

Papers citing "GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis"

6 / 6 papers shown
Title
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
60
0
0
28 Apr 2025
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian
Shenzhe Zhu
Yuehan Qin
Li Li
Z. Wang
Chaowei Xiao
Yue Zhao
18
0
0
03 Apr 2025
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
Wenyu Wang
M. Zhang
Xiaotian Ye
Z. Z. Ren
Z. Chen
Pengjie Ren
MU
KELM
67
0
0
06 Mar 2025
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
67
0
0
19 Dec 2024
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
W. Zhang
Nenghai Yu
AAML
36
0
0
03 Nov 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1