Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15606
Cited By
HateCheck: Functional Tests for Hate Speech Detection Models
31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HateCheck: Functional Tests for Hate Speech Detection Models"
43 / 143 papers shown
Title
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
8
10
0
24 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
23
10
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
23
3
0
19 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
23
0
0
14 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
34
7
0
14 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
51
33
0
09 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Agostina Calabrese
Bjorn Ross
Mirella Lapata
27
10
0
06 Oct 2022
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Janis Goldzycher
Gerold Schneider
16
7
0
03 Oct 2022
Debiasing Word Embeddings with Nonlinear Geometry
Lu Cheng
Nayoung Kim
Huan Liu
16
5
0
29 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
443
0
23 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
17
210
0
05 Aug 2022
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Urja Khurana
I. Vermeulen
Eric T. Nalisnick
M. V. Noorloos
Antske Fokkens
AILaw
15
16
0
30 Jun 2022
Flexible text generation for counterfactual fairness probing
Zee Fryer
Vera Axelrod
Ben Packer
Alex Beutel
Jilin Chen
Kellie Webster
17
18
0
28 Jun 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
22
65
0
20 Jun 2022
Adversarial Text Normalization
Joanna Bitton
Maya Pavlova
Ivan Evtimov
AAML
22
2
0
08 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
16
36
0
08 Jun 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
19
16
0
09 May 2022
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Esma Balkir
I. Nejadgholi
Kathleen C. Fraser
S. Kiritchenko
FAtt
28
27
0
06 May 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Nitesh Goyal
Ian D Kivlichan
Rachel Rosen
Lucy Vasserman
14
88
0
01 May 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Mithun Das
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
16
15
0
30 Apr 2022
Handling and Presenting Harmful Text in NLP Research
Hannah Rose Kirk
Abeba Birhane
Bertie Vidgen
Leon Derczynski
13
47
0
29 Apr 2022
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
Pedro Henrique Luz de Araujo
Benjamin Roth
14
2
0
08 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
80
5,983
0
05 Apr 2022
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model
Fei Mi
Yitong Li
Yulong Zeng
Jingyan Zhou
Yasheng Wang
Chuanfei Xu
Lifeng Shang
Xin Jiang
Shiqi Zhao
Qun Liu
ALM
37
18
0
31 Mar 2022
Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments
Antonis Maronikolakis
Axel Wisiorek
Leah Nann
Haris Jabbar
Sahana Udupa
Hinrich Schütze
22
24
0
22 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
11
344
0
17 Mar 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
8
609
0
07 Feb 2022
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab
Abraham Israeli
Oren Tsur
25
1
0
27 Jan 2022
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Paul Röttger
Bertie Vidgen
Dirk Hovy
J. Pierrehumbert
14
11
0
14 Dec 2021
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models
Betty van Aken
S. Herrmann
Alexander Loser
18
11
0
30 Nov 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
367
0
15 Oct 2021
Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language
Flor Miriam Plaza del Arco
S. Halat
Sebastian Padó
Roman Klinger
18
34
0
21 Sep 2021
Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis
Saif M. Mohammad
17
67
0
17 Sep 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
32
105
0
07 Jul 2021
An Information Retrieval Approach to Building Datasets for Hate Speech Detection
Md. Mustafizur Rahman
Dinesh Balakrishnan
Dhiraj Murthy
Mucahid Kutlu
Matthew Lease
8
24
0
17 Jun 2021
pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks
Juan Manuel Pérez
Mariela Rajngewerc
Juan Carlos Giudici
D. Furman
Franco Luque
Laura Alonso Alemany
María Vanina Martínez
16
29
0
17 Jun 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
25
56
0
21 May 2021
Multilingual Offensive Language Identification for Low-resource Languages
Tharindu Ranasinghe
Marcos Zampieri
17
64
0
12 May 2021
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Mohit Bansal
Christopher Potts
Adina Williams
16
387
0
07 Apr 2021
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
11
242
0
31 Dec 2020
Confronting Abusive Language Online: A Survey from the Ethical and Human Rights Perspective
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
AILaw
20
83
0
22 Dec 2020
A Framework for the Computational Linguistic Analysis of Dehumanization
Julia Mendelsohn
Yulia Tsvetkov
Dan Jurafsky
82
89
0
06 Mar 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
237
319
0
21 Aug 2019
Previous
1
2
3