Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.13494
Cited By
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
21 February 2024
Yueqi Xie
Minghong Fang
Renjie Pi
Neil Zhenqiang Gong
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis"
7 / 7 papers shown
Title
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
60
0
0
28 Apr 2025
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian
Shenzhe Zhu
Yuehan Qin
Li Li
Z. Wang
Chaowei Xiao
Yue Zhao
18
0
0
03 Apr 2025
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
Wenyu Wang
M. Zhang
Xiaotian Ye
Z. Z. Ren
Z. Chen
Pengjie Ren
MU
KELM
67
0
0
06 Mar 2025
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
67
0
0
19 Dec 2024
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
W. Zhang
Nenghai Yu
AAML
36
0
0
03 Nov 2024
A Framework for Real-time Safeguarding the Text Generation of Large Language Model
Ximing Dong
Dayi Lin
Shaowei Wang
Ahmed E. Hassan
20
1
0
29 Apr 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1