Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.08424
Cited By
Tastle: Distract Large Language Models for Automatic Jailbreak Attack
13 March 2024
Zeguan Xiao
Yan Yang
Guanhua Chen
Yun-Nung Chen
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tastle: Distract Large Language Models for Automatic Jailbreak Attack"
4 / 4 papers shown
Title
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
54
0
0
02 May 2025
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
Yang Jiao
X. Wang
Kai Yang
AAML
SILM
31
0
0
10 Apr 2025
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
216
441
0
23 Aug 2022
1