ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15606
  4. Cited By
HateCheck: Functional Tests for Hate Speech Detection Models

HateCheck: Functional Tests for Hate Speech Detection Models

31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
ArXivPDFHTML

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

50 / 143 papers shown
Title
System Prompt Optimization with Meta-Learning
System Prompt Optimization with Meta-Learning
Yumin Choi
Jinheon Baek
Sung Ju Hwang
LLMAG
48
0
0
14 May 2025
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study
Faeze Ghorbanpour
Daryna Dementieva
Alexander M. Fraser
40
0
0
09 May 2025
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation
SAGE\texttt{SAGE}SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Towards a comprehensive taxonomy of online abusive language informed by machine leaning
Towards a comprehensive taxonomy of online abusive language informed by machine leaning
Samaneh Hosseini Moghaddam
Kelly Lyons
Cheryl Regehr
Vivek Goel
Kaitlyn Regehr
23
0
0
24 Apr 2025
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection
Myrthe Reuver
Indira Sen
Matteo Melis
Gabriella Lapesa
20
0
0
21 Apr 2025
A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English
A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English
Julian Bäumler
Louis Blöcher
Lars-Joel Frey
Xian Chen
Markus Bayer
Christian A. Reuter
AILaw
44
0
0
11 Apr 2025
AutoTestForge: A Multidimensional Automated Testing Framework for Natural Language Processing Models
Hengrui Xing
Cong Tian
L. Zhao
Z. Ma
WenSheng Wang
N. Zhang
Chao Huang
Zhenhua Duan
47
0
0
07 Mar 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations
David Hartmann
Amin Oueslati
Dimitri Staufer
Lena Pohlmann
Simon Munzert
Hendrik Heuer
48
0
0
03 Mar 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation
Shiza Ali
Jeremy Blackburn
Gianluca Stringhini
59
0
0
24 Feb 2025
Echoes of Discord: Forecasting Hater Reactions to Counterspeech
Echoes of Discord: Forecasting Hater Reactions to Counterspeech
Xiaoying Song
Sharon Lisseth Perez
Xinchen Yu
Eduardo Blanco
Lingzi Hong
113
0
0
17 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-Wei Lee
VLM
75
0
0
16 Feb 2025
SubData: A Python Library to Collect and Combine Datasets for Evaluating
  LLM Alignment on Downstream Tasks
SubData: A Python Library to Collect and Combine Datasets for Evaluating LLM Alignment on Downstream Tasks
Leon Fröhling
Pietro Bernardelle
Gianluca Demartini
ALM
74
0
0
21 Dec 2024
A Survey on Automatic Online Hate Speech Detection in Low-Resource
  Languages
A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages
Susmita Das
Arpita Dutta
Kingshuk Roy
Abir Mondal
Arnab Mukhopadhyay
66
0
0
28 Nov 2024
HateDay: Insights from a Global Hate Speech Dataset Representative of a
  Day on Twitter
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Manuel Tonneau
Diyi Liu
Niyati Malhotra
Scott A. Hale
Samuel Fraiberger
Victor Orozco-Olvera
Paul Röttger
71
0
0
23 Nov 2024
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
44
1
0
21 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
28
1
0
17 Oct 2024
BenchmarkCards: Large Language Model and Risk Reporting
BenchmarkCards: Large Language Model and Risk Reporting
Anna Sokol
Nuno Moniz
Elizabeth M. Daly
Michael Hind
Nitesh V. Chawla
31
0
0
16 Oct 2024
Disentangling Hate Across Target Identities
Disentangling Hate Across Target Identities
Yiping Jin
Leo Wanner
Aneesh Moideen Koya
23
0
0
14 Oct 2024
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection
Camilla Casula
Sara Tonelli
26
0
0
10 Oct 2024
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
46
6
0
04 Oct 2024
AggregHate: An Efficient Aggregative Approach for the Detection of
  Hatemongers on Social Platforms
AggregHate: An Efficient Aggregative Approach for the Detection of Hatemongers on Social Platforms
Tom Marzea
Abraham Israeli
Oren Tsur
23
0
0
22 Sep 2024
What Is Wrong with My Model? Identifying Systematic Problems with
  Semantic Data Slicing
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang
Yining Hong
Grace A. Lewis
Tongshuang Wu
Christian Kastner
38
1
0
14 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
52
1
0
05 Sep 2024
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic
  CheckLists
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Raoyuan Zhao
Abdullatif Köksal
Yihong Liu
Leonie Weissweiler
Anna Korhonen
Hinrich Schütze
SyDa
36
1
0
30 Aug 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in
  Subjective Tasks?
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
Swabha Swayamdipta
35
3
0
26 Aug 2024
Decoding Climate Disagreement: A Graph Neural Network-Based Approach to
  Understanding Social Media Dynamics
Decoding Climate Disagreement: A Graph Neural Network-Based Approach to Understanding Social Media Dynamics
Ruiran Su
J. Pierrehumbert
24
0
0
09 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts
  Discovery from Large-Scale Human-LLM Conversational Datasets
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
34
3
0
03 Jul 2024
Whispering Experts: Neural Interventions for Toxicity Mitigation in
  Language Models
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
Xavier Suau
Pieter Delobelle
Katherine Metcalf
Armand Joulin
N. Apostoloff
Luca Zappella
P. Rodríguez
MU
AAML
32
8
0
02 Jul 2024
CELL your Model: Contrastive Explanations for Large Language Models
CELL your Model: Contrastive Explanations for Large Language Models
Ronny Luss
Erik Miehling
Amit Dhurandhar
40
0
0
17 Jun 2024
Sexism Detection on a Data Diet
Sexism Detection on a Data Diet
Rabiraj Bandyopadhyay
Dennis Assenmacher
J. Alonso-Moral
Claudia Wagner
41
0
0
07 Jun 2024
Prompt Exploration with Prompt Regression
Prompt Exploration with Prompt Regression
Michael Feffer
Ronald Xu
Yuekai Sun
Mikhail Yurochkin
22
0
0
17 May 2024
Mitigating Exaggerated Safety in Large Language Models
Mitigating Exaggerated Safety in Large Language Models
Ruchi Bhalani
Ruchira Ray
21
1
0
08 May 2024
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource
  Languages of Singapore
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
Ri Chi Ng
Nirmalendu Prakash
Ming Shan Hee
K. T. W. Choo
Roy Ka-Wei Lee
35
4
0
03 May 2024
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate
  Speech Datasets
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
Manuel Tonneau
Diyi Liu
Samuel Fraiberger
Ralph Schroeder
Scott A. Hale
Paul Röttger
27
5
0
27 Apr 2024
Analyzing Toxicity in Deep Conversations: A Reddit Case Study
Analyzing Toxicity in Deep Conversations: A Reddit Case Study
Vigneshwaran Shankaran
Rajesh Sharma
28
1
0
11 Apr 2024
NLP for Counterspeech against Hate: A Survey and How-To Guide
NLP for Counterspeech against Hate: A Survey and How-To Guide
Helena Bonaldi
Yi-Ling Chung
Gavin Abercrombie
Marco Guerini
AAML
31
13
0
29 Mar 2024
Improving Adversarial Data Collection by Supporting Annotators: Lessons
  from GAHD, a German Hate Speech Dataset
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Janis Goldzycher
Paul Röttger
Gerold Schneider
AAML
29
8
0
28 Mar 2024
NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using
  Representative Data
NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data
Manuel Tonneau
Pedro Vitor Quinta de Castro
Karim Lasri
I. Farouq
Lakshminarayanan Subramanian
Victor Orozco-Olvera
Samuel Fraiberger
36
9
0
28 Mar 2024
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive
  Speech Detection via Large Language Models
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
H. Nghiem
Hal Daumé
31
1
0
18 Mar 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Lei Gao
Yue Niu
Tingting Tang
A. Avestimehr
Murali Annavaram
MU
32
10
0
13 Mar 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
56
0
0
13 Mar 2024
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the
  Challenges and Opportunities of Large Language Models in Hate Speech
  Detection
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection
Tharindu Kumarage
Amrita Bhattacharjee
Joshua Garland
39
7
0
12 Mar 2024
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech
  Detection?
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Yiping Jin
Leo Wanner
A. Shvets
21
2
0
23 Feb 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a
  Multilingual Sentiment Lexicon
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
44
7
0
03 Feb 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
30
66
0
29 Jan 2024
Towards a Non-Ideal Methodological Framework for Responsible ML
Towards a Non-Ideal Methodological Framework for Responsible ML
Ramaravind Kommiya Mothilal
Shion Guha
Syed Ishtiaque Ahmed
32
7
0
20 Jan 2024
Muted: Multilingual Targeted Offensive Speech Identification and
  Visualization
Muted: Multilingual Targeted Offensive Speech Identification and Visualization
Christoph Tillmann
Aashka Trivedi
Sara Rosenthal
Santosh Borse
Rong Zhang
Avirup Sil
Bishwaranjan Bhattacharjee
8
2
0
18 Dec 2023
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Rahul Madhavan
Kahini Wadhawan
21
0
0
19 Nov 2023
Functionality learning through specification instructions
Functionality learning through specification instructions
Pedro Henrique Luz de Araujo
Benjamin Roth
ELM
33
0
0
14 Nov 2023
People Make Better Edits: Measuring the Efficacy of LLM-Generated
  Counterfactually Augmented Data for Harmful Language Detection
People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection
Indira Sen
Dennis Assenmacher
Mattia Samory
Isabelle Augenstein
Wil M.P. van der Aalst
Claudia Wagner
17
19
0
02 Nov 2023
123
Next