Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.06561
Cited By
Intention Analysis Makes LLMs A Good Jailbreak Defender
12 January 2024
Yuqi Zhang
Liang Ding
Lefei Zhang
Dacheng Tao
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Intention Analysis Makes LLMs A Good Jailbreak Defender"
14 / 14 papers shown
Title
GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms
Sinan He
An Wang
25
0
0
17 Apr 2025
Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models
He Cao
Weidi Luo
Zijing Liu
Yu Wang
Bing Feng
Yuan Yao
Yuan Yao
Yu Li
AAML
50
2
0
23 Oct 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
22
7
0
11 Oct 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
49
5
0
25 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
34
77
0
05 Jul 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
36
26
0
26 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
42
10
0
13 Jun 2024
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
Chen Xiong
Xiangyu Qi
Pin-Yu Chen
Tsung-Yi Ho
AAML
26
18
0
30 May 2024
Can Large Language Models Understand Context?
Yilun Zhu
Joel Ruben Antony Moniz
Shruti Bhargava
Jiarui Lu
Dhivya Piraviperumal
Site Li
Yuan-kang Zhang
Hong-ye Yu
Bo-Hsiang Tseng
51
18
0
01 Feb 2024
Revisiting Demonstration Selection Strategies in In-Context Learning
Keqin Peng
Liang Ding
Yancheng Yuan
Xuebo Liu
Min Zhang
Y. Ouyang
Dacheng Tao
19
20
0
22 Jan 2024
Towards Making the Most of ChatGPT for Machine Translation
Keqin Peng
Liang Ding
Qihuang Zhong
Li Shen
Xuebo Liu
Min Zhang
Y. Ouyang
Dacheng Tao
LRM
81
203
0
24 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
216
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1