Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.00172
Cited By
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
31 October 2023
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield"
2 / 2 papers shown
Title
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
45
1
0
05 Sep 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
1