Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.20099
Cited By
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
30 May 2024
Chen Xiong
Xiangyu Qi
Pin-Yu Chen
Tsung-Yi Ho
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks"
6 / 6 papers shown
Title
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models
Siyuan Liang
Jiayang Liu
Jiecheng Zhai
Tianmeng Fang
Rongcheng Tu
A. Liu
Xiaochun Cao
Dacheng Tao
VGen
49
0
0
22 Apr 2025
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Y. Wang
Y. Liu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
65
0
0
13 Apr 2025
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
66
1
0
09 Oct 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
66
6
0
08 Jun 2024
Summarization is (Almost) Dead
Xiao Pu
Mingqi Gao
Xiaojun Wan
HILM
65
38
0
18 Sep 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1