ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00172
  4. Cited By
Robust Safety Classifier for Large Language Models: Adversarial Prompt
  Shield

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

31 October 2023
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
    AAML
ArXivPDFHTML

Papers citing "Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield"

1 / 1 papers shown
Title
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
1