Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.15606
Cited By
v1
v2 (latest)
HateCheck: Functional Tests for Hate Speech Detection Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HateCheck: Functional Tests for Hate Speech Detection Models"
50 / 162 papers shown
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection
Mithun Das
Saurabh Kumar Pandey
Animesh Mukherjee
301
13
0
22 May 2023
Cross-functional Analysis of Generalisation in Behavioural Learning
Transactions of the Association for Computational Linguistics (TACL), 2023
Pedro Henrique Luz de Araujo
Benjamin Roth
186
4
0
22 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
International Conference on Human Factors in Computing Systems (CHI), 2023
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
246
20
0
12 Apr 2023
Interpretable Unified Language Checking
Tianhua Zhang
Hongyin Luo
Yung-Sung Chuang
Wei Fang
Luc Gaitskell
Thomas Hartvigsen
Xixin Wu
D. Fox
Helen M. Meng
James R. Glass
203
31
0
07 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks
Antonis Maronikolakis
Abdullatif Köksal
Hinrich Schütze
374
0
0
04 Apr 2023
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
267
48
0
31 Mar 2023
A Federated Approach for Hate Speech Detection
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Jay Gala
Deep Gandhi
Jash Mehta
Zeerak Talat
129
5
0
18 Feb 2023
Auditing large language models: a three-layered approach
AI and Ethics (AE), 2023
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
486
270
0
16 Feb 2023
Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection
The Web Conference (WWW), 2023
Soumyajit Gupta
Sooyong Lee
Maria De-Arteaga
Matthew Lease
234
17
0
14 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
140
2
0
28 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
183
9
0
05 Jan 2023
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Lorena Piedras
Lucas Rosenblatt
Julia Wilkins
268
11
0
05 Jan 2023
Evaluating Psychological Safety of Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xingxuan Li
Yutong Li
Linlin Liu
Shafiq Joty
Lidong Bing
LM&MA
229
32
0
20 Dec 2022
Manifestations of Xenophobia in AI Systems
Ai & Society (AS), 2022
Nenad Tomašev
J. L. Maynard
Iason Gabriel
405
11
0
15 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Ana Kotarcic
Dominik Hangartner
Fabrizio Gilardi
Selina Kurer
K. Donnay
216
4
0
05 Dec 2022
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhexin Zhang
Jiale Cheng
Hao Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Shiyu Huang
SILM
368
11
0
04 Dec 2022
Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning
Md. Tawkat Islam Khondaker
Muhammad Abdul-Mageed
L. Lakshmanan
95
2
0
11 Nov 2022
CoRAL: a Context-aware Croatian Abusive Language Dataset
Ravi Shekhar
Mladen Karan
Matthew Purver
229
7
0
11 Nov 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
244
1
0
08 Nov 2022
System Demo: Tool and Infrastructure for Offensive Language Error Analysis (OLEA) in English
M. Grace
XajavionJaySeabrum
Dananjay Srinivas
Alexis Palmer
94
0
0
28 Oct 2022
"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Federico Bianchi
S. A. Hills
Patrícia G. C. Rossini
Dirk Hovy
Rebekah Tromble
N. Tintarev
193
17
0
28 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
332
16
0
24 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
192
12
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
190
5
0
19 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
International Conference on Computational Linguistics (COLING), 2022
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
183
0
0
14 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Debora Nozza
Dirk Hovy
240
8
0
14 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
193
40
0
09 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Transactions of the Association for Computational Linguistics (TACL), 2022
Agostina Calabrese
Bjorn Ross
Mirella Lapata
203
12
0
06 Oct 2022
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Workshop on Trolling, Aggression and Cyberbullying (TRAC), 2022
Janis Goldzycher
Gerold Schneider
227
10
0
03 Oct 2022
Debiasing Word Embeddings with Nonlinear Geometry
International Conference on Computational Linguistics (COLING), 2022
Lu Cheng
Nayoung Kim
Huan Liu
188
5
0
29 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
603
633
0
23 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
AAAI Conference on Artificial Intelligence (AAAI), 2022
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
259
341
0
05 Aug 2022
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Urja Khurana
I. Vermeulen
Eric T. Nalisnick
M. V. Noorloos
Antske Fokkens
AILaw
129
23
0
30 Jun 2022
Flexible text generation for counterfactual fairness probing
Zee Fryer
Vera Axelrod
Ben Packer
Alex Beutel
Jilin Chen
Kellie Webster
122
22
0
28 Jun 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
179
79
0
20 Jun 2022
Adversarial Text Normalization
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Joanna Bitton
Maya Pavlova
Ivan Evtimov
AAML
187
3
0
08 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
289
40
0
08 Jun 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
236
19
0
09 May 2022
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Esma Balkir
I. Nejadgholi
Kathleen C. Fraser
S. Kiritchenko
FAtt
189
30
0
06 May 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Nitesh Goyal
Ian D Kivlichan
Rachel Rosen
Lucy Vasserman
265
110
0
01 May 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
International Conference on Language Resources and Evaluation (LREC), 2022
Mithun Das
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
222
25
0
30 Apr 2022
Handling and Presenting Harmful Text in NLP Research
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hannah Rose Kirk
Abeba Birhane
Bertie Vidgen
Leon Derczynski
290
58
0
29 Apr 2022
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
Pedro Henrique Luz de Araujo
Benjamin Roth
136
2
0
08 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Journal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
1.2K
7,457
0
05 Apr 2022
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model
Fei Mi
Yitong Li
Yulong Zeng
Jingyan Zhou
Yasheng Wang
Chuanfei Xu
Lifeng Shang
Xin Jiang
Shiqi Zhao
Qun Liu
ALM
340
17
0
31 Mar 2022
Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments
Findings (Findings), 2022
Antonis Maronikolakis
Axel Wisiorek
Leah Nann
Haris Jabbar
Sahana Udupa
Hinrich Schütze
211
24
0
22 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
396
510
0
17 Mar 2022
Red Teaming Language Models with Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
448
862
0
07 Feb 2022
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab
Abraham Israeli
Oren Tsur
155
1
0
27 Jan 2022
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Paul Röttger
Bertie Vidgen
Dirk Hovy
J. Pierrehumbert
144
14
0
14 Dec 2021
Previous
1
2
3
4
Next