Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.03191
Cited By
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
6 November 2023
Xuan Li
Zhanke Zhou
Jianing Zhu
Jiangchao Yao
Tongliang Liu
Bo Han
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
23 / 23 papers shown
Title
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
X. Jia
Yingfei Sun
Qianqian Xu
Q. Huang
AAML
63
0
0
03 May 2025
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
54
0
0
02 May 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
X. Zhang
AAML
43
0
0
27 Feb 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
45
0
0
24 Feb 2025
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
Rui Li
Peiyi Wang
Jingyuan Ma
Di Zhang
Lei Sha
Zhifang Sui
LLMAG
44
0
0
22 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
105
9
0
28 Jan 2025
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Melissa Kazemi Rad
Huy Nghiem
Andy Luo
Sahil Wadhwa
Mohammad Sorower
Stephen Rawls
AAML
89
2
0
22 Jan 2025
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Yang Ouyang
Hengrui Gu
Shuhang Lin
Wenyue Hua
Jie Peng
B. Kailkhura
Tianlong Chen
Kaixiong Zhou
Kaixiong Zhou
AAML
31
1
0
05 Jan 2025
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
67
0
0
19 Dec 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
47
3
0
17 Nov 2024
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
W. Zhang
Nenghai Yu
AAML
38
0
0
03 Nov 2024
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu-Gang Jiang
VLM
AAML
50
3
0
28 Oct 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
40
4
0
05 Aug 2024
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Yue Zhou
Henry Peng Zou
Barbara Maria Di Eugenio
Yang Zhang
HILM
LRM
37
1
0
01 Jul 2024
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
40
9
0
17 Jun 2024
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li
Qianshan Wei
Chuanyi Zhang
Guilin Qi
Miaozeng Du
Yongrui Chen
Sheng Bi
Fan Liu
VLM
MU
62
12
0
21 May 2024
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem
Omar Mahmoud
Niloofar Mireshghallah
Hyunwoo J. Kim
Yulia Tsvetkov
Yejin Choi
Sherif Saad
Santu Rana
47
18
0
05 Mar 2024
FedImpro: Measuring and Improving Client Update in Federated Learning
Zhenheng Tang
Yonggang Zhang
S. Shi
Xinmei Tian
Tongliang Liu
Bo Han
Xiaowen Chu
FedML
13
13
0
10 Feb 2024
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Jason Vega
Isha Chaudhary
Changming Xu
Gagandeep Singh
AAML
14
18
0
19 Dec 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1