Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.03299
Cited By
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
5 February 2024
Haibo Jin
Ruoxi Chen
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models"
12 / 12 papers shown
Title
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Yakai Li
Jiekang Hu
Weiduan Sang
Luping Ma
Jing Xie
Weijuan Zhang
Aimin Yu
Shijie Zhao
Qingjia Huang
Qihang Zhou
AAML
45
0
0
28 Apr 2025
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu
Peiran Li
Edward Suh
Yevgeniy Vorobeychik
Zhuoqing Mao
Somesh Jha
Patrick McDaniel
Huan Sun
Bo Li
Chaowei Xiao
28
17
0
03 Oct 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
47
36
0
01 Aug 2024
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Weidi Luo
Siyuan Ma
Xiaogeng Liu
Xiaoyu Guo
Chaowei Xiao
AAML
67
69
0
03 Apr 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Qibing Ren
Chang Gao
Jing Shao
Junchi Yan
Xin Tan
Wai Lam
Lizhuang Ma
ALM
ELM
AAML
42
21
0
12 Mar 2024
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
Divij Handa
Advait Chirmule
Bimal Gajera
Chitta Baral
Chitta Baral
42
18
0
16 Feb 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
102
84
0
21 Aug 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
A Survey on Knowledge Graphs: Representation, Acquisition and Applications
Shaoxiong Ji
Shirui Pan
Erik Cambria
Pekka Marttinen
Philip S. Yu
169
1,877
0
02 Feb 2020
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
29,632
0
16 Jan 2013
1