Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17104
Cited By
Automated Adversarial Discovery for Safety Classifiers
24 June 2024
Yash Kumar Lal
Preethi Lahoti
Aradhana Sinha
Yao Qin
Ananth Balashankar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automated Adversarial Discovery for Safety Classifiers"
2 / 2 papers shown
Title
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
68
62
0
26 Feb 2024
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
185
711
0
17 Apr 2018
1