Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.18118
Cited By
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
26 June 2024
Caishuang Huang
Wanxu Zhao
Rui Zheng
Huijie Lv
Shihan Dou
Sixian Li
Xiao Wang
Enyu Zhou
Junjie Ye
Yuming Yang
Tao Gui
Qi Zhang
Xuanjing Huang
LLMSV
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance"
5 / 5 papers shown
Title
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
66
1
0
09 Oct 2024
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Xinyi Zeng
Yuying Shang
Yutao Zhu
Jingyuan Zhang
Yu Tian
AAML
48
2
0
09 Oct 2024
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bill Yuchen Lin
Radha Poovendran
AAML
129
82
0
14 Feb 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
50
56
0
11 Jan 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
1