Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.20413
Cited By
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
30 May 2024
Haibo Jin
Andy Zhou
Joe D. Menke
Haohan Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters"
5 / 5 papers shown
Title
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
68
5
0
02 Oct 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
47
36
0
01 Aug 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Qibing Ren
Chang Gao
Jing Shao
Junchi Yan
Xin Tan
Wai Lam
Lizhuang Ma
ALM
ELM
AAML
42
21
0
12 Mar 2024
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
Divij Handa
Advait Chirmule
Bimal Gajera
Chitta Baral
Chitta Baral
42
18
0
16 Feb 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
37
12
0
17 Jan 2024
1