Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.13077
Cited By
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
21 May 2024
Govind Ramesh
Yao Dou
Wei-ping Xu
PILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation"
11 / 11 papers shown
Title
Geneshift: Impact of different scenario shift on Jailbreaking LLM
Tianyi Wu
Zhiwei Xue
Yue Liu
Jiaheng Zhang
Bryan Hooi
See-Kiong Ng
28
0
0
10 Apr 2025
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges
Francisco Eiras
Eliott Zemour
Eric Lin
Vaikkunth Mugunthan
ELM
66
0
0
06 Mar 2025
Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism
Libo Wang
LRM
AI4CE
42
0
0
14 Nov 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
22
7
0
11 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
46
1
0
09 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
23
12
0
02 Oct 2024
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wenyuan Xu
32
6
0
28 Aug 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
32
7
0
13 Jun 2024
AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization
Jiawei Chen
Xiao Yang
Zhengwei Fang
Yu Tian
Yinpeng Dong
Zhaoxia Yin
Hang Su
19
1
0
30 May 2024
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
42
48
0
14 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
1