ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15606
  4. Cited By
HateCheck: Functional Tests for Hate Speech Detection Models

HateCheck: Functional Tests for Hate Speech Detection Models

31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
ArXivPDFHTML

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

43 / 143 papers shown
Title
Multilingual Auxiliary Tasks Training: Bridging the Gap between
  Languages for Zero-Shot Transfer of Hate Speech Detection Models
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
8
10
0
24 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into
  Under-Resourced Languages
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
23
10
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
  Classifier Uses Sentiment Information
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
23
3
0
19 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
23
0
0
14 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
34
7
0
14 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
51
33
0
09 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Explainable Abuse Detection as Intent Classification and Slot Filling
Agostina Calabrese
Bjorn Ross
Mirella Lapata
27
10
0
06 Oct 2022
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Janis Goldzycher
Gerold Schneider
16
7
0
03 Oct 2022
Debiasing Word Embeddings with Nonlinear Geometry
Debiasing Word Embeddings with Nonlinear Geometry
Lu Cheng
Nayoung Kim
Huan Liu
16
5
0
29 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
443
0
23 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
17
210
0
05 Aug 2022
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech
  Definitions
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Urja Khurana
I. Vermeulen
Eric T. Nalisnick
M. V. Noorloos
Antske Fokkens
AILaw
15
16
0
30 Jun 2022
Flexible text generation for counterfactual fairness probing
Flexible text generation for counterfactual fairness probing
Zee Fryer
Vera Axelrod
Ben Packer
Alex Beutel
Jilin Chen
Kellie Webster
17
18
0
28 Jun 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech
  Detection Models
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
22
65
0
20 Jun 2022
Adversarial Text Normalization
Adversarial Text Normalization
Joanna Bitton
Maya Pavlova
Ivan Evtimov
AAML
22
2
0
08 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of
  NLP Models
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
16
36
0
08 Jun 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism
  and Hate Speech Detection
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
19
16
0
09 May 2022
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study
  in Hate Speech Detection
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Esma Balkir
I. Nejadgholi
Kathleen C. Fraser
S. Kiritchenko
FAtt
28
27
0
06 May 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on
  Toxicity Annotation
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Nitesh Goyal
Ian D Kivlichan
Rachel Rosen
Lucy Vasserman
14
88
0
01 May 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Mithun Das
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
16
15
0
30 Apr 2022
Handling and Presenting Harmful Text in NLP Research
Handling and Presenting Harmful Text in NLP Research
Hannah Rose Kirk
Abeba Birhane
Bertie Vidgen
Leon Derczynski
13
47
0
29 Apr 2022
Checking HateCheck: a cross-functional analysis of behaviour-aware
  learning for hate speech detection
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
Pedro Henrique Luz de Araujo
Benjamin Roth
14
2
0
08 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
80
5,983
0
05 Apr 2022
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained
  Language Model
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model
Fei Mi
Yitong Li
Yulong Zeng
Jingyan Zhou
Yasheng Wang
Chuanfei Xu
Lifeng Shang
Xin Jiang
Shiqi Zhao
Qun Liu
ALM
37
18
0
31 Mar 2022
Listening to Affected Communities to Define Extreme Speech: Dataset and
  Experiments
Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments
Antonis Maronikolakis
Axel Wisiorek
Leah Nann
Haris Jabbar
Sahana Udupa
Hinrich Schütze
22
24
0
22 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
11
344
0
17 Mar 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
8
609
0
07 Feb 2022
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab
Abraham Israeli
Oren Tsur
25
1
0
27 Jan 2022
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Paul Röttger
Bertie Vidgen
Dirk Hovy
J. Pierrehumbert
14
11
0
14 Dec 2021
What Do You See in this Patient? Behavioral Testing of Clinical NLP
  Models
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models
Betty van Aken
S. Herrmann
Alexander Loser
18
11
0
30 Nov 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
367
0
15 Oct 2021
Multi-Task Learning with Sentiment, Emotion, and Target Detection to
  Recognize Hate Speech and Offensive Language
Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language
Flor Miriam Plaza del Arco
S. Halat
Sebastian Padó
Roman Klinger
18
34
0
21 Sep 2021
Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis
Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis
Saif M. Mohammad
17
67
0
17 Sep 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and
  Tooling
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
32
105
0
07 Jul 2021
An Information Retrieval Approach to Building Datasets for Hate Speech
  Detection
An Information Retrieval Approach to Building Datasets for Hate Speech Detection
Md. Mustafizur Rahman
Dinesh Balakrishnan
Dhiraj Murthy
Mucahid Kutlu
Matthew Lease
8
24
0
17 Jun 2021
pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks
pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks
Juan Manuel Pérez
Mariela Rajngewerc
Juan Carlos Giudici
D. Furman
Franco Luque
Laura Alonso Alemany
María Vanina Martínez
16
29
0
17 Jun 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic
  Next-Generation Benchmarking
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
25
56
0
21 May 2021
Multilingual Offensive Language Identification for Low-resource
  Languages
Multilingual Offensive Language Identification for Low-resource Languages
Tharindu Ranasinghe
Marcos Zampieri
17
64
0
12 May 2021
Dynabench: Rethinking Benchmarking in NLP
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Mohit Bansal
Christopher Potts
Adina Williams
16
387
0
07 Apr 2021
Learning from the Worst: Dynamically Generated Datasets to Improve
  Online Hate Detection
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
11
242
0
31 Dec 2020
Confronting Abusive Language Online: A Survey from the Ethical and Human
  Rights Perspective
Confronting Abusive Language Online: A Survey from the Ethical and Human Rights Perspective
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
AILaw
20
83
0
22 Dec 2020
A Framework for the Computational Linguistic Analysis of Dehumanization
A Framework for the Computational Linguistic Analysis of Dehumanization
Julia Mendelsohn
Yulia Tsvetkov
Dan Jurafsky
82
89
0
06 Mar 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
237
319
0
21 Aug 2019
Previous
123