ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15606
  4. Cited By
HateCheck: Functional Tests for Hate Speech Detection Models
v1v2 (latest)

HateCheck: Functional Tests for Hate Speech Detection Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
ArXiv (abs)PDFHTML

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

50 / 162 papers shown
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate
  Speech Detection
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection
Mithun Das
Saurabh Kumar Pandey
Animesh Mukherjee
301
13
0
22 May 2023
Cross-functional Analysis of Generalisation in Behavioural Learning
Cross-functional Analysis of Generalisation in Behavioural LearningTransactions of the Association for Computational Linguistics (TACL), 2023
Pedro Henrique Luz de Araujo
Benjamin Roth
186
4
0
22 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model
  Improvements
Angler: Helping Machine Translation Practitioners Prioritize Model ImprovementsInternational Conference on Human Factors in Computing Systems (CHI), 2023
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
246
20
0
12 Apr 2023
Interpretable Unified Language Checking
Interpretable Unified Language Checking
Tianhua Zhang
Hongyin Luo
Yung-Sung Chuang
Wei Fang
Luc Gaitskell
Thomas Hartvigsen
Xixin Wu
D. Fox
Helen M. Meng
James R. Glass
203
31
0
07 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech
  detection tasks
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks
Antonis Maronikolakis
Abdullatif Köksal
Hinrich Schütze
374
0
0
04 Apr 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
267
48
0
31 Mar 2023
A Federated Approach for Hate Speech Detection
A Federated Approach for Hate Speech DetectionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Jay Gala
Deep Gandhi
Jash Mehta
Zeerak Talat
129
5
0
18 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approachAI and Ethics (AE), 2023
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILawMLAU
486
270
0
16 Feb 2023
Same Same, But Different: Conditional Multi-Task Learning for
  Demographic-Specific Toxicity Detection
Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity DetectionThe Web Conference (WWW), 2023
Soumyajit Gupta
Sooyong Lee
Maria De-Arteaga
Matthew Lease
234
17
0
14 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution
  Generalization of VQA Models
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
140
2
0
28 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
183
9
0
05 Jan 2023
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Lorena Piedras
Lucas Rosenblatt
Julia Wilkins
268
11
0
05 Jan 2023
Evaluating Psychological Safety of Large Language Models
Evaluating Psychological Safety of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xingxuan Li
Yutong Li
Linlin Liu
Shafiq Joty
Lidong Bing
LM&MA
229
32
0
20 Dec 2022
Manifestations of Xenophobia in AI Systems
Manifestations of Xenophobia in AI SystemsAi & Society (AS), 2022
Nenad Tomašev
J. L. Maynard
Iason Gabriel
405
11
0
15 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Ana Kotarcic
Dominik Hangartner
Fabrizio Gilardi
Selina Kurer
K. Donnay
216
4
0
05 Dec 2022
Constructing Highly Inductive Contexts for Dialogue Safety through
  Controllable Reverse Generation
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhexin Zhang
Jiale Cheng
Hao Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Shiyu Huang
SILM
368
11
0
04 Dec 2022
Cross-Platform and Cross-Domain Abusive Language Detection with
  Supervised Contrastive Learning
Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning
Md. Tawkat Islam Khondaker
Muhammad Abdul-Mageed
L. Lakshmanan
95
2
0
11 Nov 2022
CoRAL: a Context-aware Croatian Abusive Language Dataset
CoRAL: a Context-aware Croatian Abusive Language Dataset
Ravi Shekhar
Mladen Karan
Matthew Purver
229
7
0
11 Nov 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
244
1
0
08 Nov 2022
System Demo: Tool and Infrastructure for Offensive Language Error
  Analysis (OLEA) in English
System Demo: Tool and Infrastructure for Offensive Language Error Analysis (OLEA) in English
M. Grace
XajavionJaySeabrum
Dananjay Srinivas
Alexis Palmer
94
0
0
28 Oct 2022
"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting
  Harmful Speech Online
"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech OnlineConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Federico Bianchi
S. A. Hills
Patrícia G. C. Rossini
Dirk Hovy
Rebekah Tromble
N. Tintarev
193
17
0
28 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between
  Languages for Zero-Shot Transfer of Hate Speech Detection Models
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
332
16
0
24 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into
  Under-Resourced Languages
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
192
12
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
  Classifier Uses Sentiment Information
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment InformationBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
190
5
0
19 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
TestAug: A Framework for Augmenting Capability-based NLP TestsInternational Conference on Computational Linguistics (COLING), 2022
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
183
0
0
14 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
The State of Profanity Obfuscation in Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Debora Nozza
Dirk Hovy
240
8
0
14 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
193
40
0
09 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Explainable Abuse Detection as Intent Classification and Slot FillingTransactions of the Association for Computational Linguistics (TACL), 2022
Agostina Calabrese
Bjorn Ross
Mirella Lapata
203
12
0
06 Oct 2022
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Hypothesis Engineering for Zero-Shot Hate Speech DetectionWorkshop on Trolling, Aggression and Cyberbullying (TRAC), 2022
Janis Goldzycher
Gerold Schneider
227
10
0
03 Oct 2022
Debiasing Word Embeddings with Nonlinear Geometry
Debiasing Word Embeddings with Nonlinear GeometryInternational Conference on Computational Linguistics (COLING), 2022
Lu Cheng
Nayoung Kim
Huan Liu
188
5
0
29 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
603
633
0
23 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
A Holistic Approach to Undesired Content Detection in the Real WorldAAAI Conference on Artificial Intelligence (AAAI), 2022
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
259
341
0
05 Aug 2022
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech
  Definitions
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Urja Khurana
I. Vermeulen
Eric T. Nalisnick
M. V. Noorloos
Antske Fokkens
AILaw
129
23
0
30 Jun 2022
Flexible text generation for counterfactual fairness probing
Flexible text generation for counterfactual fairness probing
Zee Fryer
Vera Axelrod
Ben Packer
Alex Beutel
Jilin Chen
Kellie Webster
122
22
0
28 Jun 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech
  Detection Models
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
179
79
0
20 Jun 2022
Adversarial Text Normalization
Adversarial Text NormalizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Joanna Bitton
Maya Pavlova
Ivan Evtimov
AAML
187
3
0
08 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of
  NLP Models
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
289
40
0
08 Jun 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism
  and Hate Speech Detection
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
236
19
0
09 May 2022
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study
  in Hate Speech Detection
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Esma Balkir
I. Nejadgholi
Kathleen C. Fraser
S. Kiritchenko
FAtt
189
30
0
06 May 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on
  Toxicity Annotation
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Nitesh Goyal
Ian D Kivlichan
Rachel Rosen
Lucy Vasserman
265
110
0
01 May 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
HateCheckHIn: Evaluating Hindi Hate Speech Detection ModelsInternational Conference on Language Resources and Evaluation (LREC), 2022
Mithun Das
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
222
25
0
30 Apr 2022
Handling and Presenting Harmful Text in NLP Research
Handling and Presenting Harmful Text in NLP ResearchConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hannah Rose Kirk
Abeba Birhane
Bertie Vidgen
Leon Derczynski
290
58
0
29 Apr 2022
Checking HateCheck: a cross-functional analysis of behaviour-aware
  learning for hate speech detection
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
Pedro Henrique Luz de Araujo
Benjamin Roth
136
2
0
08 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with PathwaysJournal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
1.2K
7,457
0
05 Apr 2022
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained
  Language Model
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model
Fei Mi
Yitong Li
Yulong Zeng
Jingyan Zhou
Yasheng Wang
Chuanfei Xu
Lifeng Shang
Xin Jiang
Shiqi Zhao
Qun Liu
ALM
340
17
0
31 Mar 2022
Listening to Affected Communities to Define Extreme Speech: Dataset and
  Experiments
Listening to Affected Communities to Define Extreme Speech: Dataset and ExperimentsFindings (Findings), 2022
Antonis Maronikolakis
Axel Wisiorek
Leah Nann
Haris Jabbar
Sahana Udupa
Hinrich Schütze
211
24
0
22 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech DetectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
396
510
0
17 Mar 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
448
862
0
07 Feb 2022
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab
Abraham Israeli
Oren Tsur
155
1
0
27 Jan 2022
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Paul Röttger
Bertie Vidgen
Dirk Hovy
J. Pierrehumbert
144
14
0
14 Dec 2021
Previous
1234
Next