v1v2 (latest)

HateCheck: Functional Tests for Hate Speech Detection Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

31 December 2020

Paul Röttger

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

50 / 162 papers shown

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Mithun Das

Saurabh Kumar Pandey

Animesh Mukherjee

301

22 May 2023

Cross-functional Analysis of Generalisation in Behavioural LearningTransactions of the Association for Computational Linguistics (TACL), 2023

Pedro Henrique Luz de Araujo

Benjamin Roth

186

22 May 2023

Angler: Helping Machine Translation Practitioners Prioritize Model ImprovementsInternational Conference on Human Factors in Computing Systems (CHI), 2023

Dominik Moritz

246

12 Apr 2023

Interpretable Unified Language Checking

203

07 Apr 2023

Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

Antonis Maronikolakis

Abdullatif Köksal

Hinrich Schütze

374

04 Apr 2023

Assessing Language Model Deployment with Risk Cards

Leon Derczynski

Hannah Rose Kirk

Vidhisha Balachandran

267

31 Mar 2023

A Federated Approach for Hate Speech DetectionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

129

18 Feb 2023

Auditing large language models: a three-layered approachAI and Ethics (AE), 2023

486

270

16 Feb 2023

Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity DetectionThe Web Conference (WWW), 2023

Soumyajit Gupta

Sooyong Lee

Maria De-Arteaga

Matthew Lease

234

14 Feb 2023

BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models

Ali Borji

CoGe

140

28 Jan 2023

Can Large Language Models Change User Preference Adversarially?

Varshini Subhash

AAML

183

05 Jan 2023

Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI

Lorena Piedras

Lucas Rosenblatt

Julia Wilkins

268

05 Jan 2023

Evaluating Psychological Safety of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

229

20 Dec 2022

Manifestations of Xenophobia in AI SystemsAi & Society (AS), 2022

Nenad Tomašev

J. L. Maynard

Iason Gabriel

405

15 Dec 2022

Human-in-the-Loop Hate Speech Classification in a Multilingual Context

216

05 Dec 2022

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Jiale Cheng

Lifeng Shang

368

04 Dec 2022

Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning

Md. Tawkat Islam Khondaker

Muhammad Abdul-Mageed

L. Lakshmanan

11 Nov 2022

CoRAL: a Context-aware Croatian Abusive Language Dataset

Ravi Shekhar

Mladen Karan

Matthew Purver

229

11 Nov 2022

NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Saadia Gabriel

Hamid Palangi

Yejin Choi

AAML

244

08 Nov 2022

System Demo: Tool and Infrastructure for Offensive Language Error Analysis (OLEA) in English

28 Oct 2022

"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech OnlineConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Federico Bianchi

S. A. Hills

Patrícia G. C. Rossini

Dirk Hovy

Rebekah Tromble

N. Tintarev

193

28 Oct 2022

Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

Syrielle Montariol

Arij Riabi

Djamé Seddah

332

24 Oct 2022

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Paul Röttger

Debora Nozza

Federico Bianchi

Dirk Hovy

192

20 Oct 2022

Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment InformationBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022

190

19 Oct 2022

TestAug: A Framework for Augmenting Capability-based NLP TestsInternational Conference on Computational Linguistics (COLING), 2022

Xueqing Liu

183

14 Oct 2022

The State of Profanity Obfuscation in Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Debora Nozza

Dirk Hovy

240

14 Oct 2022

Quantifying Social Biases Using Templates is Unreliable

P. Seshadri

Pouya Pezeshkpour

Sameer Singh

193

09 Oct 2022

Explainable Abuse Detection as Intent Classification and Slot FillingTransactions of the Association for Computational Linguistics (TACL), 2022

Agostina Calabrese

Bjorn Ross

Mirella Lapata

203

06 Oct 2022

Hypothesis Engineering for Zero-Shot Hate Speech DetectionWorkshop on Trolling, Aggression and Cyberbullying (TRAC), 2022

Janis Goldzycher

Gerold Schneider

227

03 Oct 2022

Debiasing Word Embeddings with Nonlinear GeometryInternational Conference on Computational Linguistics (COLING), 2022

Lu Cheng

Nayoung Kim

Huan Liu

188

29 Aug 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Deep Ganguli

...

603

633

23 Aug 2022

A Holistic Approach to Undesired Content Detection in the Real WorldAAAI Conference on Artificial Intelligence (AAAI), 2022

Tyna Eloundou

259

341

05 Aug 2022

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

129

30 Jun 2022

Flexible text generation for counterfactual fairness probing

122

28 Jun 2022

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Paul Röttger

179

20 Jun 2022

Adversarial Text NormalizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

187

08 Jun 2022

Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models

289

08 Jun 2022

Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

236

09 May 2022

Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

189

06 May 2022

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

265

110

01 May 2022

HateCheckHIn: Evaluating Hindi Hate Speech Detection ModelsInternational Conference on Language Resources and Evaluation (LREC), 2022

222

30 Apr 2022

Handling and Presenting Harmful Text in NLP ResearchConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

290

29 Apr 2022

Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection

Pedro Henrique Luz de Araujo

Benjamin Roth

136

08 Apr 2022

PaLM: Scaling Language Modeling with PathwaysJournal of machine learning research (JMLR), 2022

Sharan Narang

...

Kathy Meier-Hellstern

1.2K

7,457

05 Apr 2022

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

Lifeng Shang

Xin Jiang

Shiqi Zhao

Qun Liu

ALM

340

31 Mar 2022

Listening to Affected Communities to Define Extreme Speech: Dataset and ExperimentsFindings (Findings), 2022

Antonis Maronikolakis

211

22 Mar 2022

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech DetectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

396

510

17 Mar 2022

Red Teaming Language Models with Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Saffron Huang

448

862

07 Feb 2022

Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab

Abraham Israeli

Oren Tsur

155

27 Jan 2022

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Paul Röttger

Bertie Vidgen

Dirk Hovy

J. Pierrehumbert

144

14 Dec 2021