ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.06463
  4. Cited By
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

12 August 2023
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Pinjia He
Shuming Shi
Zhaopeng Tu
    SILM
ArXivPDFHTML

Papers citing "GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher"

50 / 180 papers shown
Title
Adversarial Attacks on Large Language Models Using Regularized
  Relaxation
Adversarial Attacks on Large Language Models Using Regularized Relaxation
Samuel Jacob Chacko
Sajib Biswas
Chashi Mahiul Islam
Fatema Tabassum Liza
Xiuwen Liu
AAML
26
1
0
24 Oct 2024
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical
  Synthesis
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
Aidan Wong
He Cao
Zijing Liu
Yu Li
33
2
0
21 Oct 2024
SPIN: Self-Supervised Prompt INjection
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAML
SILM
23
0
0
17 Oct 2024
Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Fast Convergence of ΦΦΦ-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Siddharth Mitra
Andre Wibisono
44
23
0
14 Oct 2024
Uncovering, Explaining, and Mitigating the Superficial Safety of
  Backdoor Defense
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Rui Min
Zeyu Qin
Nevin L. Zhang
Li Shen
Minhao Cheng
AAML
31
4
0
13 Oct 2024
BlackDAN: A Black-Box Multi-Objective Approach for Effective and
  Contextual Jailbreaking of Large Language Models
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Xinyuan Wang
Victor Shea-Jay Huang
Renmiao Chen
Hao Wang
C. Pan
Lei Sha
Minlie Huang
AAML
23
2
0
13 Oct 2024
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention
  Manipulation
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Zijun Wang
Haoqin Tu
J. Mei
Bingchen Zhao
Y. Wang
Cihang Xie
24
5
0
11 Oct 2024
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt
  Decomposition Process
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Peiran Wang
Xiaogeng Liu
Chaowei Xiao
AAML
24
3
0
11 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
33
8
0
09 Oct 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Tianyu Wu
Lingrui Mei
Ruibin Yuan
Lujun Li
Wei Xue
Yike Guo
35
1
0
04 Oct 2024
Permissive Information-Flow Analysis for Large Language Models
Permissive Information-Flow Analysis for Large Language Models
Shoaib Ahmed Siddiqui
Radhika Gaonkar
Boris Köpf
David M. Krueger
Andrew J. Paverd
Ahmed Salem
Shruti Tople
Lukas Wutschitz
Menglin Xia
Santiago Zanella Béguelin
18
1
0
04 Oct 2024
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen
Dongcheng Zhao
Yiting Dong
Xiang-Yu He
Yi Zeng
AAML
45
0
0
03 Oct 2024
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu
Peiran Li
Edward Suh
Yevgeniy Vorobeychik
Zhuoqing Mao
Somesh Jha
Patrick McDaniel
Huan Sun
Bo Li
Chaowei Xiao
30
17
0
03 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
29
12
0
02 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
62
2
0
02 Oct 2024
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
70
5
0
02 Oct 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
64
4
0
23 Sep 2024
PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement
  Learning-Based Jailbreak Approach
PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach
Zhihao Lin
Wei Ma
Mingyi Zhou
Yanjie Zhao
Haoyu Wang
Yang Liu
Jun Wang
Li Li
AAML
30
5
0
21 Sep 2024
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting
  LLMs
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
Lijia Lv
Weigang Zhang
Xuehai Tang
Jie Wen
Feng Liu
Jizhong Han
Songlin Hu
AAML
24
2
0
11 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
47
1
0
05 Sep 2024
Learning to Ask: When LLM Agents Meet Unclear Instruction
Learning to Ask: When LLM Agents Meet Unclear Instruction
Wenxuan Wang
Juluan Shi
Chaozheng Wang
Cheryl Lee
Chaozheng Wang
Cheryl Lee
Youliang Yuan
Jen-tse Huang
Wenxiang Jiao
Michael R. Lyu
LLMAG
24
8
0
31 Aug 2024
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak
  Attacks
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Jason Zhang
Julius Broomfield
Sara Pieri
Reihaneh Iranmanesh
Reihaneh Rabbany
Kellin Pelrine
AAML
28
6
0
29 Aug 2024
FRACTURED-SORRY-Bench: Framework for Revealing Attacks in Conversational
  Turns Undermining Refusal Efficacy and Defenses over SORRY-Bench
FRACTURED-SORRY-Bench: Framework for Revealing Attacks in Conversational Turns Undermining Refusal Efficacy and Defenses over SORRY-Bench
Aman Priyanshu
Supriti Vijay
AAML
23
1
0
28 Aug 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wenyuan Xu
32
6
0
28 Aug 2024
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language
  Models
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
Hongfu Liu
Yuxi Xie
Ye Wang
Michael Shieh
33
2
0
27 Aug 2024
Characterizing and Evaluating the Reliability of LLMs against Jailbreak
  Attacks
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen
Yi Liu
Dongxia Wang
Jiaying Chen
Wenhai Wang
44
1
0
18 Aug 2024
Caution for the Environment: Multimodal Agents are Susceptible to
  Environmental Distractions
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions
Xinbei Ma
Yiting Wang
Yao Yao
Tongxin Yuan
Aston Zhang
Zhuosheng Zhang
Hai Zhao
AAML
LLMAG
22
16
0
05 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
30
7
0
02 Aug 2024
The Dark Side of Function Calling: Pathways to Jailbreaking Large
  Language Models
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
Zihui Wu
Haichang Gao
Jianping He
Ping Wang
17
6
0
25 Jul 2024
Course-Correction: Safety Alignment Using Synthetic Preferences
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu
Yishuo Cai
Z. Zhou
Renjie Gu
Haiqin Weng
Yan Liu
Tianwei Zhang
Wei Xu
Han Qiu
27
4
0
23 Jul 2024
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
Shi Lin
Rongchang Li
Xun Wang
Changting Lin
Xun Wang
Wenpeng Xing
Meng Han
Meng Han
53
3
0
23 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
The Better Angels of Machine Personality: How Personality Relates to LLM
  Safety
The Better Angels of Machine Personality: How Personality Relates to LLM Safety
Jie M. Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
LLMAG
PILM
40
12
0
17 Jul 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled
  Refusal Training
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
35
19
0
12 Jul 2024
Multilingual Blending: LLM Safety Alignment Evaluation with Language
  Mixture
Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Jiayang Song
Yuheng Huang
Zhehua Zhou
Lei Ma
37
6
0
10 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
34
77
0
05 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts
  Discovery from Large-Scale Human-LLM Conversational Datasets
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
18
3
0
03 Jul 2024
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content
  Moderation of Large Language Models
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
Hayder Elesedy
Pedro M. Esperança
Silviu Vlad Oprea
Mete Ozay
KELM
18
2
0
03 Jul 2024
Enhancing the Capability and Robustness of Large Language Models through
  Reinforcement Learning-Driven Query Refinement
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Zisu Huang
Xiaohua Wang
Feiran Zhang
Zhibo Xu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
AAML
LRM
27
4
0
01 Jul 2024
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi
Alexander Wei
Eric Wallace
Tony T. Wang
Nika Haghtalab
Jacob Steinhardt
SILM
AAML
34
28
0
28 Jun 2024
Jailbreaking LLMs with Arabic Transliteration and Arabizi
Jailbreaking LLMs with Arabic Transliteration and Arabizi
Mansour Al Ghanim
Saleh Almohaimeed
Mengxin Zheng
Yan Solihin
Qian Lou
26
2
0
26 Jun 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response
  Disparity Guidance
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
Caishuang Huang
Wanxu Zhao
Rui Zheng
Huijie Lv
Shihan Dou
...
Junjie Ye
Yuming Yang
Tao Gui
Qi Zhang
Xuanjing Huang
LLMSV
AAML
40
7
0
26 Jun 2024
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
37
6
0
21 Jun 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALM
ELM
47
51
0
20 Jun 2024
Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Shangqing Tu
Zhuoran Pan
Wenxuan Wang
Zhexin Zhang
Yuliang Sun
Jifan Yu
Hongning Wang
Lei Hou
Juanzi Li
ALM
42
1
0
17 Jun 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
35
7
0
13 Jun 2024
RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack
  against LLMs
RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
Xuan Chen
Yuzhou Nie
Lu Yan
Yunshu Mao
Wenbo Guo
Xiangyu Zhang
21
6
0
13 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
42
10
0
13 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A
  Comprehensive Survey
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
52
16
0
12 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
71
8
0
08 Jun 2024
Previous
1234
Next