Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.05418
Cited By
Mitigating Exaggerated Safety in Large Language Models
8 May 2024
Ruchi Bhalani
Ruchira Ray
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mitigating Exaggerated Safety in Large Language Models"
2 / 2 papers shown
Title
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
311
11,915
0
04 Mar 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
65
57
0
12 Aug 2021
1