ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.05418
  4. Cited By
Mitigating Exaggerated Safety in Large Language Models

Mitigating Exaggerated Safety in Large Language Models

8 May 2024
Ruchi Bhalani
Ruchira Ray
ArXivPDFHTML

Papers citing "Mitigating Exaggerated Safety in Large Language Models"

2 / 2 papers shown
Title
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
311
11,915
0
04 Mar 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for
  Benchmarking and Detecting Emoji-based Hate
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
65
57
0
12 Aug 2021
1