ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.06463
  4. Cited By
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

12 August 2023
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Pinjia He
Shuming Shi
Zhaopeng Tu
    SILM
ArXivPDFHTML

Papers citing "GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher"

50 / 180 papers shown
Title
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
Fan Liu
Zhao Xu
Hao Liu
AAML
43
9
0
07 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future
  Pathways
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
42
19
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
35
17
0
03 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models
  and Their Defenses
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
AAML
60
28
0
03 Jun 2024
Two Tales of Persona in LLMs: A Survey of Role-Playing and
  Personalization
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
Yu-Min Tseng
Yu-Chao Huang
Teng-Yun Hsiao
Yu-Ching Hsu
Chao-Wei Huang
Jia-Yin Foo
Yun-Nung Chen
LLMAG
246
63
0
03 Jun 2024
Improved Techniques for Optimization-Based Jailbreaking on Large
  Language Models
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Xiaochun Cao
Min-Bin Lin
AAML
44
22
0
31 May 2024
Enhancing Jailbreak Attack Against Large Language Models through Silent
  Tokens
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Jiahao Yu
Haozheng Luo
Jerry Yao-Chieh Hu
Wenbo Guo
Han Liu
Xinyu Xing
31
18
0
31 May 2024
Jailbreaking Large Language Models Against Moderation Guardrails via
  Cipher Characters
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Haibo Jin
Andy Zhou
Joe D. Menke
Haohan Wang
38
10
0
30 May 2024
AutoBreach: Universal and Adaptive Jailbreaking with Efficient
  Wordplay-Guided Optimization
AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization
Jiawei Chen
Xiao Yang
Zhengwei Fang
Yu Tian
Yinpeng Dong
Zhaoxia Yin
Hang Su
19
1
0
30 May 2024
AI Risk Management Should Incorporate Both Safety and Security
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Bo-wen Li
Dawn Song
Peter Henderson
Prateek Mittal
AAML
43
10
0
29 May 2024
A Theoretical Understanding of Self-Correction through In-context
  Alignment
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
28
13
0
28 May 2024
Privacy-Aware Visual Language Models
Privacy-Aware Visual Language Models
Laurens Samson
Nimrod Barazani
S. Ghebreab
Yukiyasu Asano
PILM
VLM
31
1
0
27 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
44
30
0
27 May 2024
Impact of Non-Standard Unicode Characters on Security and Comprehension
  in Large Language Models
Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models
Johan S Daniel
Anand Pal
19
0
0
23 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
37
18
0
14 May 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
39
5
0
25 Apr 2024
Protecting Your LLMs with Information Bottleneck
Protecting Your LLMs with Information Bottleneck
Zichuan Liu
Zefan Wang
Linjie Xu
Jinyu Wang
Lei Song
Tianchun Wang
Chunlin Chen
Wei Cheng
Jiang Bian
KELM
AAML
51
15
0
22 Apr 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large
  Language Models
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Minfeng Zhu
Wei Zhang
Wei Chen
40
3
0
12 Apr 2024
AmpleGCG: Learning a Universal and Transferable Generative Model of
  Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
Zeyi Liao
Huan Sun
AAML
39
72
0
11 Apr 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for
  Large Language Models
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
Zehui Chen
Kuikun Liu
Qiuchen Wang
Wenwei Zhang
Jiangning Liu
Dahua Lin
Kai-xiang Chen
Feng Zhao
LLMAG
ALM
AIFin
54
27
0
19 Mar 2024
RigorLLM: Resilient Guardrails for Large Language Models against
  Undesired Content
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan
Zidi Xiong
Yi Zeng
Ning Yu
Ruoxi Jia
D. Song
Bo-wen Li
AAML
KELM
40
38
0
19 Mar 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
71
29
0
18 Mar 2024
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Jen-tse Huang
E. Li
Man Ho Lam
Tian Liang
Wenxuan Wang
Youliang Yuan
Wenxiang Jiao
Xing Wang
Zhaopeng Tu
Michael R. Lyu
ELM
LLMAG
77
32
0
18 Mar 2024
Tastle: Distract Large Language Models for Automatic Jailbreak Attack
Tastle: Distract Large Language Models for Automatic Jailbreak Attack
Zeguan Xiao
Yan Yang
Guanhua Chen
Yun-Nung Chen
AAML
35
7
0
13 Mar 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language
  Models via Code Completion
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Qibing Ren
Chang Gao
Jing Shao
Junchi Yan
Xin Tan
Wai Lam
Lizhuang Ma
ALM
ELM
AAML
42
21
0
12 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
43
139
0
05 Mar 2024
Accelerating Greedy Coordinate Gradient via Probe Sampling
Accelerating Greedy Coordinate Gradient via Probe Sampling
Yiran Zhao
Wenyue Zheng
Tianle Cai
Xuan Long Do
Kenji Kawaguchi
Anirudh Goyal
Michael Shieh
36
2
0
02 Mar 2024
Making Them Ask and Answer: Jailbreaking Large Language Models in Few
  Queries via Disguise and Reconstruction
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Tong Liu
Yingjie Zhang
Zhe Zhao
Yinpeng Dong
Guozhu Meng
Kai Chen
AAML
38
43
0
28 Feb 2024
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large
  Language Models
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
Huijie Lv
Xiao Wang
Yuan Zhang
Caishuang Huang
Shihan Dou
Junjie Ye
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
26
29
0
26 Feb 2024
Defending LLMs against Jailbreaking Attacks via Backtranslation
Defending LLMs against Jailbreaking Attacks via Backtranslation
Yihan Wang
Zhouxing Shi
Andrew Bai
Cho-Jui Hsieh
AAML
32
32
0
26 Feb 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the
  vulnerabilities of safety guardrails to harmful queries
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
24
10
0
23 Feb 2024
Defending Jailbreak Prompts via In-Context Adversarial Game
Defending Jailbreak Prompts via In-Context Adversarial Game
Yujun Zhou
Yufei Han
Haomin Zhuang
Kehan Guo
Zhenwen Liang
Hongyan Bao
Xiangliang Zhang
LLMAG
AAML
22
11
0
20 Feb 2024
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
Divij Handa
Advait Chirmule
Bimal Gajera
Chitta Baral
Chitta Baral
42
18
0
16 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
52
55
0
14 Feb 2024
Leveraging the Context through Multi-Round Interactions for Jailbreaking
  Attacks
Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks
Yixin Cheng
Markos Georgopoulos
V. Cevher
Grigorios G. Chrysos
AAML
22
15
0
14 Feb 2024
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware
  Decoding
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bill Yuchen Lin
Radha Poovendran
AAML
129
82
0
14 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation
  and Echopraxia
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
Guangyu Shen
Shuyang Cheng
Kai-xian Zhang
Guanhong Tao
Shengwei An
Lu Yan
Zhuo Zhang
Shiqing Ma
Xiangyu Zhang
10
10
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
78
0
07 Feb 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test
  Guideline Adherence of Large Language Models
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
19
19
0
05 Feb 2024
Weak-to-Strong Jailbreaking on Large Language Models
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Yang Wang
26
52
0
30 Jan 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
25
66
0
29 Jan 2024
An Empirical Study on Large Language Models in Accuracy and Robustness
  under Chinese Industrial Scenarios
An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios
Zongjie Li
Wenying Qiu
Pingchuan Ma
Yichen Li
You Li
Sijia He
Baozheng Jiang
Shuai Wang
Weixi Gu
16
2
0
27 Jan 2024
MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse
  Worlds
MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse Worlds
Xiaolong Jin
Zhuo Zhang
Xiangyu Zhang
8
3
0
25 Jan 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
21
58
0
18 Jan 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to
  Challenge AI Safety by Humanizing LLMs
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
18
251
0
12 Jan 2024
Intention Analysis Makes LLMs A Good Jailbreak Defender
Intention Analysis Makes LLMs A Good Jailbreak Defender
Yuqi Zhang
Liang Ding
Lefei Zhang
Dacheng Tao
LLMSV
17
15
0
12 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
52
56
0
11 Jan 2024
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem
Andrew J. Paverd
Boris Köpf
19
8
0
12 Dec 2023
Make Them Spill the Beans! Coercive Knowledge Extraction from
  (Production) LLMs
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
23
13
0
08 Dec 2023
Previous
1234
Next