ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15606
  4. Cited By
HateCheck: Functional Tests for Hate Speech Detection Models
v1v2 (latest)

HateCheck: Functional Tests for Hate Speech Detection Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
ArXiv (abs)PDFHTML

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

50 / 162 papers shown
Sexism Detection on a Data Diet
Sexism Detection on a Data DietWeb Science Conference (WebSci), 2024
Rabiraj Bandyopadhyay
Dennis Assenmacher
J. Alonso-Moral
Claudia Wagner
199
2
0
07 Jun 2024
Prompt Exploration with Prompt Regression
Prompt Exploration with Prompt Regression
Michael Feffer
Ronald Xu
Yuekai Sun
Mikhail Yurochkin
150
1
0
17 May 2024
Mitigating Exaggerated Safety in Large Language Models
Mitigating Exaggerated Safety in Large Language Models
Ruchi Bhalani
Ruchira Ray
204
5
0
08 May 2024
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource
  Languages of Singapore
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
Ri Chi Ng
Nirmalendu Prakash
Ming Shan Hee
K. T. W. Choo
Roy Ka-wei Lee
219
16
0
03 May 2024
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
Manuel Tonneau
Diyi Liu
Samuel Fraiberger
Ralph Schroeder
Scott A. Hale
Paul Röttger
396
20
0
27 Apr 2024
Analyzing Toxicity in Deep Conversations: A Reddit Case Study
Analyzing Toxicity in Deep Conversations: A Reddit Case Study
Vigneshwaran Shankaran
Rajesh Sharma
164
4
0
11 Apr 2024
NLP for Counterspeech against Hate: A Survey and How-To Guide
NLP for Counterspeech against Hate: A Survey and How-To Guide
Helena Bonaldi
Yi-Ling Chung
Gavin Abercrombie
Marco Guerini
AAML
303
28
0
29 Mar 2024
Improving Adversarial Data Collection by Supporting Annotators: Lessons
  from GAHD, a German Hate Speech Dataset
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Janis Goldzycher
Paul Röttger
Gerold Schneider
AAML
224
16
0
28 Mar 2024
NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using
  Representative Data
NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data
Manuel Tonneau
Pedro Vitor Quinta de Castro
Karim Lasri
I. Farouq
Lakshminarayanan Subramanian
Victor Orozco-Olvera
Samuel Fraiberger
341
16
0
28 Mar 2024
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive
  Speech Detection via Large Language Models
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
H. Nghiem
Hal Daumé
377
6
0
18 Mar 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Lei Gao
Yue Niu
Tingting Tang
A. Avestimehr
Murali Annavaram
MU
316
19
0
13 Mar 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial IntelligenceArtificial Intelligence Review (Artif Intell Rev), 2024
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
593
13
0
13 Mar 2024
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the
  Challenges and Opportunities of Large Language Models in Hate Speech
  Detection
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection
Tharindu Kumarage
Amrita Bhattacharjee
Joshua Garland
273
11
0
12 Mar 2024
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech
  Detection?
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Yiping Jin
Leo Wanner
A. Shvets
237
5
0
23 Feb 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a
  Multilingual Sentiment Lexicon
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
236
24
0
03 Feb 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
441
115
0
29 Jan 2024
Towards a Non-Ideal Methodological Framework for Responsible ML
Towards a Non-Ideal Methodological Framework for Responsible MLInternational Conference on Human Factors in Computing Systems (CHI), 2024
Ramaravind Kommiya Mothilal
Shion Guha
Syed Ishtiaque Ahmed
298
11
0
20 Jan 2024
Muted: Multilingual Targeted Offensive Speech Identification and
  Visualization
Muted: Multilingual Targeted Offensive Speech Identification and Visualization
Christoph Tillmann
Aashka Trivedi
Sara Rosenthal
Santosh Borse
Rong Zhang
Avirup Sil
Bishwaranjan Bhattacharjee
155
3
0
18 Dec 2023
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Rahul Madhavan
Kahini Wadhawan
271
0
0
19 Nov 2023
Functionality learning through specification instructions
Functionality learning through specification instructionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Pedro Henrique Luz de Araujo
Benjamin Roth
ELM
216
0
0
14 Nov 2023
People Make Better Edits: Measuring the Efficacy of LLM-Generated
  Counterfactually Augmented Data for Harmful Language Detection
People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language DetectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Indira Sen
Dennis Assenmacher
Mattia Samory
Isabelle Augenstein
Wil M.P. van der Aalst
Claudia Wagner
459
31
0
02 Nov 2023
Can You Rely on Your Model Evaluation? Improving Model Evaluation with
  Synthetic Test Data
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test DataNeural Information Processing Systems (NeurIPS), 2023
B. V. Breugel
Nabeel Seedat
F. Imrie
M. Schaar
SyDa
217
36
0
25 Oct 2023
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific
  Ratings
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific RatingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Chaewon Park
Soohwan Kim
Kyubyong Park
Kunwoo Park
230
11
0
24 Oct 2023
Meta learning with language models: Challenges and opportunities in the
  classification of imbalanced text
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Apostol T. Vassilev
Honglan Jin
Munawar Hasan
264
1
0
23 Oct 2023
Towards General Error Diagnosis via Behavioral Testing in Machine
  Translation
Towards General Error Diagnosis via Behavioral Testing in Machine Translation
Junjie Wu
Lemao Liu
Dit-Yan Yeung
132
2
0
20 Oct 2023
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using
  LLMs
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs
Chenyang Yang
Rishabh Rustogi
Rachel A. Brower-Sinning
Grace A. Lewis
Jane Hsieh
Tongshuang Wu
KELM
209
13
0
14 Oct 2023
How toxic is antisemitism? Potentials and limitations of automated
  toxicity scoring for antisemitic online content
How toxic is antisemitism? Potentials and limitations of automated toxicity scoring for antisemitic online content
Helena Mihaljević
Elisabeth Steffen
117
2
0
05 Oct 2023
Can Language Models be Instructed to Protect Personal Information?
Can Language Models be Instructed to Protect Personal Information?
Yang Chen
Ethan Mendes
Sauvik Das
Wei Xu
Alan Ritter
PILM
198
47
0
03 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
212
4
0
02 Oct 2023
Towards a Unified Framework for Adaptable Problematic Content Detection
  via Continual Learning
Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning
Ali Omrani
Alireza S. Ziabari
Preni Golazizian
Jeffery Sorensen
Morteza Dehghani
243
2
0
29 Sep 2023
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation
  Approach for the Generation and Detection of Problematic Content
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content
Charles OÑeill
Jack Miller
I. Ciucă
Y. Ting 丁
Thang Bui
186
10
0
26 Aug 2023
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
  Framework for Content Moderation Software
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation SoftwareInternational Conference on Automated Software Engineering (ASE), 2023
Wenxuan Wang
Jingyuan Huang
Shu Yang
Chang Chen
Jiazhen Gu
Pinjia He
Michael R. Lyu
VLM
136
7
0
18 Aug 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large
  Language Models to Tackle Toxic Content
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic ContentIEEE Symposium on Security and Privacy (IEEE S&P), 2023
Xinlei He
Savvas Zannettou
Yun Shen
Yang Zhang
CLL
172
66
0
10 Aug 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALMELMAILaw
389
259
0
02 Aug 2023
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for
  Detecting Abuse Targeted at Public Figures
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public FiguresWorkshop on Trolling, Aggression and Cyberbullying (TRAC), 2023
Angus R. Williams
Hannah Rose Kirk
L. Burke
Yi-Ling Chung
Ivan Debono
Pica Johansson
Francesca Stevens
Jonathan Bright
Scott A. Hale
291
1
0
31 Jul 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation
  Policies
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
247
7
0
23 Jul 2023
Evaluating AI systems under uncertain ground truth: a case study in dermatology
Evaluating AI systems under uncertain ground truth: a case study in dermatology
David Stutz
A. Cemgil
Abhijit Guha Roy
Tatiana Matejovicova
Melih Barsbey
...
Yossi Matias
Pushmeet Kohli
Yao Xiao
Arnaud Doucet
Alan Karthikesalingam
289
3
0
05 Jul 2023
Concept-Based Explanations to Test for False Causal Relationships
  Learned by Abusive Language Classifiers
Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers
I. Nejadgholi
S. Kiritchenko
Kathleen C. Fraser
Esma Balkir
221
1
0
04 Jul 2023
A Weakly Supervised Classifier and Dataset of White Supremacist Language
A Weakly Supervised Classifier and Dataset of White Supremacist LanguageAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Michael Miller Yoder
Ahmad Diab
D. W. Brown
Kathleen M. Carley
196
5
0
27 Jun 2023
Politeness Stereotypes and Attack Vectors: Gender Stereotypes in
  Japanese and Korean Language Models
Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models
Victor Steinborn
Antonis Maronikolakis
Hinrich Schütze
253
0
0
16 Jun 2023
Evaluating the Effectiveness of Natural Language Inference for Hate
  Speech Detection in Languages with Limited Labeled Data
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data
Janis Goldzycher
Moritz Preisig
Chantal Amrhein
Gerold Schneider
195
4
0
06 Jun 2023
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive
  Statements
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive StatementsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xuhui Zhou
Haojie Zhu
Akhila Yerukola
Thomas Davidson
Jena D. Hwang
Swabha Swayamdipta
Maarten Sap
213
49
0
03 Jun 2023
Revisiting Hate Speech Benchmarks: From Data Curation to System
  Deployment
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentKnowledge Discovery and Data Mining (KDD), 2023
Atharva Kulkarni
Sarah Masud
Vikram Goyal
Tanmoy Chakraborty
196
11
0
01 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute
  Controlled Generation
CFL: Causally Fair Language Models Through Token-level Attribute Controlled GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
240
6
0
01 Jun 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model ApplicationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
366
34
0
28 May 2023
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Query-Efficient Black-Box Red Teaming via Bayesian OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
201
30
0
27 May 2023
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language
  Models
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Julia Mendelsohn
Ronan Le Bras
Yejin Choi
Maarten Sap
182
33
0
26 May 2023
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained
  language models
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language modelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Isabelle Lorge
J. Pierrehumbert
234
0
0
25 May 2023
How to Solve Few-Shot Abusive Content Detection Using the Data We
  Actually Have
How to Solve Few-Shot Abusive Content Detection Using the Data We Actually HaveInternational Conference on Language Resources and Evaluation (LREC), 2023
Viktor Hangya
Kangyang Luo
195
2
0
23 May 2023
Validating Multimedia Content Moderation Software via Semantic Fusion
Validating Multimedia Content Moderation Software via Semantic FusionInternational Symposium on Software Testing and Analysis (ISSTA), 2023
Wenxuan Wang
Jingyuan Huang
Chang Chen
Jiazhen Gu
Jianping Zhang
Weibin Wu
Pinjia He
Michael Lyu
215
11
0
23 May 2023
Previous
1234
Next
Page 2 of 4