Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.00172
Cited By
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
31 October 2023
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield"
1 / 1 papers shown
Title
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
1