Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.08325
Cited By
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
16 June 2022
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
Laura Weidinger
Sumanth Dathathri
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models"
9 / 9 papers shown
Title
SAGE
\texttt{SAGE}
SAGE
: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Evaluating the Propensity of Generative AI for Producing Harmful Disinformation During an Election Cycle
Erik J Schlicht
106
0
0
20 Jan 2025
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
Hye Sun Yun
Iain J. Marshall
T. Trikalinos
Byron C. Wallace
11
16
0
19 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
51
1,136
0
17 May 2023
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
364
0
15 Oct 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
191
0
15 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
257
374
0
28 Feb 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
206
607
0
03 Sep 2019
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
294
4,143
0
23 Aug 2019
1