ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15606
  4. Cited By
HateCheck: Functional Tests for Hate Speech Detection Models

HateCheck: Functional Tests for Hate Speech Detection Models

31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
ArXivPDFHTML

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

50 / 143 papers shown
Title
Can You Rely on Your Model Evaluation? Improving Model Evaluation with
  Synthetic Test Data
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data
B. V. Breugel
Nabeel Seedat
F. Imrie
M. Schaar
SyDa
24
19
0
25 Oct 2023
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific
  Ratings
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings
Chaewon Park
Soohwan Kim
Kyubyong Park
Kunwoo Park
19
4
0
24 Oct 2023
Meta learning with language models: Challenges and opportunities in the
  classification of imbalanced text
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Apostol T. Vassilev
Honglan Jin
Munawar Hasan
6
0
0
23 Oct 2023
Towards General Error Diagnosis via Behavioral Testing in Machine
  Translation
Towards General Error Diagnosis via Behavioral Testing in Machine Translation
Junjie Wu
Lemao Liu
Dit-Yan Yeung
24
2
0
20 Oct 2023
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using
  LLMs
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs
Chenyang Yang
Rishabh Rustogi
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
KELM
30
11
0
14 Oct 2023
How toxic is antisemitism? Potentials and limitations of automated
  toxicity scoring for antisemitic online content
How toxic is antisemitism? Potentials and limitations of automated toxicity scoring for antisemitic online content
Helena Mihaljević
Elisabeth Steffen
9
2
0
05 Oct 2023
Can Language Models be Instructed to Protect Personal Information?
Can Language Models be Instructed to Protect Personal Information?
Yang Chen
Ethan Mendes
Sauvik Das
Wei-ping Xu
Alan Ritter
PILM
19
34
0
03 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
14
3
0
02 Oct 2023
Towards a Unified Framework for Adaptable Problematic Content Detection
  via Continual Learning
Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning
Ali Omrani
Alireza S. Ziabari
Preni Golazizian
Jeffery Sorensen
Morteza Dehghani
19
1
0
29 Sep 2023
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation
  Approach for the Generation and Detection of Problematic Content
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content
Charles OÑeill
Jack Miller
I. Ciucă
Y. Ting 丁
Thang Bui
23
3
0
26 Aug 2023
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
  Framework for Content Moderation Software
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Wenxuan Wang
Jingyuan Huang
Jen-tse Huang
Chang Chen
Jiazhen Gu
Pinjia He
Michael R. Lyu
VLM
28
6
0
18 Aug 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large
  Language Models to Tackle Toxic Content
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content
Xinlei He
Savvas Zannettou
Yun Shen
Yang Zhang
CLL
13
37
0
10 Aug 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
21
122
0
02 Aug 2023
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for
  Detecting Abuse Targeted at Public Figures
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Angus R. Williams
Hannah Rose Kirk
L. Burke
Yi-Ling Chung
Ivan Debono
Pica Johansson
Francesca Stevens
Jonathan Bright
Scott A. Hale
26
1
0
31 Jul 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation
  Policies
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
35
3
0
23 Jul 2023
Evaluating AI systems under uncertain ground truth: a case study in dermatology
Evaluating AI systems under uncertain ground truth: a case study in dermatology
David Stutz
A. Cemgil
Abhijit Guha Roy
Tatiana Matejovicova
Melih Barsbey
...
Yossi Matias
Pushmeet Kohli
Yun-hui Liu
Arnaud Doucet
Alan Karthikesalingam
25
4
0
05 Jul 2023
Concept-Based Explanations to Test for False Causal Relationships
  Learned by Abusive Language Classifiers
Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers
I. Nejadgholi
S. Kiritchenko
Kathleen C. Fraser
Esma Balkir
21
0
0
04 Jul 2023
A Weakly Supervised Classifier and Dataset of White Supremacist Language
A Weakly Supervised Classifier and Dataset of White Supremacist Language
Michael Miller Yoder
Ahmad Diab
D. W. Brown
Kathleen M. Carley
19
5
0
27 Jun 2023
Politeness Stereotypes and Attack Vectors: Gender Stereotypes in
  Japanese and Korean Language Models
Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models
Victor Steinborn
Antonis Maronikolakis
Hinrich Schütze
16
0
0
16 Jun 2023
Evaluating the Effectiveness of Natural Language Inference for Hate
  Speech Detection in Languages with Limited Labeled Data
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data
Janis Goldzycher
Moritz Preisig
Chantal Amrhein
Gerold Schneider
21
3
0
06 Jun 2023
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive
  Statements
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements
Xuhui Zhou
Haojie Zhu
Akhila Yerukola
Thomas Davidson
Jena D. Hwang
Swabha Swayamdipta
Maarten Sap
19
33
0
03 Jun 2023
Revisiting Hate Speech Benchmarks: From Data Curation to System
  Deployment
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment
Atharva Kulkarni
Sarah Masud
Vikram Goyal
Tanmoy Chakraborty
18
9
0
01 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute
  Controlled Generation
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
15
5
0
01 Jun 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
30
28
0
28 May 2023
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
19
23
0
27 May 2023
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language
  Models
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Julia Mendelsohn
Ronan Le Bras
Yejin Choi
Maarten Sap
21
25
0
26 May 2023
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained
  language models
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models
Isabelle Lorge
J. Pierrehumbert
31
0
0
25 May 2023
How to Solve Few-Shot Abusive Content Detection Using the Data We
  Actually Have
How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have
Viktor Hangya
Alexander M. Fraser
26
0
0
23 May 2023
Validating Multimedia Content Moderation Software via Semantic Fusion
Validating Multimedia Content Moderation Software via Semantic Fusion
Wenxuan Wang
Jingyuan Huang
Chang Chen
Jiazhen Gu
Jianping Zhang
Weibin Wu
Pinjia He
Michael Lyu
60
9
0
23 May 2023
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate
  Speech Detection
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection
Mithun Das
Saurabh Kumar Pandey
Animesh Mukherjee
41
10
0
22 May 2023
Cross-functional Analysis of Generalisation in Behavioural Learning
Cross-functional Analysis of Generalisation in Behavioural Learning
Pedro Henrique Luz de Araujo
Benjamin Roth
10
3
0
22 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model
  Improvements
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
25
15
0
12 Apr 2023
Interpretable Unified Language Checking
Interpretable Unified Language Checking
Tianhua Zhang
Hongyin Luo
Yung-Sung Chuang
Wei Fang
Luc Gaitskell
Thomas Hartvigsen
Xixin Wu
D. Fox
Helen M. Meng
James R. Glass
27
22
0
07 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech
  detection tasks
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks
Antonis Maronikolakis
Abdullatif Köksal
Hinrich Schütze
32
0
0
04 Apr 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
20
42
0
31 Mar 2023
A Federated Approach for Hate Speech Detection
A Federated Approach for Hate Speech Detection
Jay Gala
Deep Gandhi
Jash Mehta
Zeerak Talat
13
4
0
18 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
34
194
0
16 Feb 2023
Same Same, But Different: Conditional Multi-Task Learning for
  Demographic-Specific Toxicity Detection
Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection
Soumyajit Gupta
Sooyong Lee
Maria De-Arteaga
Matthew Lease
8
13
0
14 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution
  Generalization of VQA Models
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
10
1
0
28 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
24
8
0
05 Jan 2023
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Lorena Piedras
Lucas Rosenblatt
Julia Wilkins
26
9
0
05 Jan 2023
Evaluating Psychological Safety of Large Language Models
Evaluating Psychological Safety of Large Language Models
Xingxuan Li
Yutong Li
Linlin Liu
Shafiq R. Joty
Lidong Bing
LM&MA
23
21
0
20 Dec 2022
Manifestations of Xenophobia in AI Systems
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
24
9
0
15 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Ana Kotarcic
Dominik Hangartner
Fabrizio Gilardi
Selina Kurer
K. Donnay
24
2
0
05 Dec 2022
Constructing Highly Inductive Contexts for Dialogue Safety through
  Controllable Reverse Generation
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Zhexin Zhang
Jiale Cheng
Hao-Lun Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Minlie Huang
SILM
18
8
0
04 Dec 2022
Cross-Platform and Cross-Domain Abusive Language Detection with
  Supervised Contrastive Learning
Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning
Md. Tawkat Islam Khondaker
Muhammad Abdul-Mageed
L. Lakshmanan
12
1
0
11 Nov 2022
CoRAL: a Context-aware Croatian Abusive Language Dataset
CoRAL: a Context-aware Croatian Abusive Language Dataset
Ravi Shekhar
Mladen Karan
Matthew Purver
33
5
0
11 Nov 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
35
1
0
08 Nov 2022
System Demo: Tool and Infrastructure for Offensive Language Error
  Analysis (OLEA) in English
System Demo: Tool and Infrastructure for Offensive Language Error Analysis (OLEA) in English
M. Grace
XajavionJaySeabrum
Dananjay Srinivas
Alexis Palmer
29
0
0
28 Oct 2022
"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting
  Harmful Speech Online
"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online
Federico Bianchi
S. A. Hills
Patrícia G. C. Rossini
Dirk Hovy
Rebekah Tromble
N. Tintarev
28
14
0
28 Oct 2022
Previous
123
Next