HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

18 December 2020

Papers citing "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection"

50 / 280 papers shown

Title
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs S. Kadhe Anisa Halimi Ambrish Rawat Nathalie Baracaldo MU 22 7 0 12 Dec 2023
Toxic language detection: a systematic review of Arabic datasets Imene Bensalem Paolo Rosso Hanane Zitouni 32 4 0 12 Dec 2023
A Text-to-Text Model for Multilingual Offensive Language Identification Tharindu Ranasinghe Marcos Zampieri 27 3 0 06 Dec 2023
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation Randall Balestriero Romain Cosentino Sarath Shekkizhar 28 2 0 04 Dec 2023
Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge Shi Yin Hong Susan Gauch 40 2 0 24 Nov 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study Maike Zufle Verna Dankers Ivan Titov 47 0 0 16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings Sagi Pendzel Tomer Wullach Amir Adler Einat Minkov 33 11 0 16 Nov 2023
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection Sarah Masud Mohammad Aflah Khan Md. Shad Akhtar Tanmoy Chakraborty 32 3 0 16 Nov 2023
The Uli Dataset: An Exercise in Experience Led Annotation of oGBV Arnav Arora Maha Jinadoss Cheshta Arora Denny George Brindaalakshmi ... Ambika Tandon Rishav Thakker Rahul Dev Korra Aatman Vaidya Tarunima Prabhakar 26 1 0 15 Nov 2023
Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models Carlos Alejandro Aguirre Kuleen Sasse Isabel Cachola Mark Dredze 32 1 0 14 Nov 2023
Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model Minh-Hao Van Xintao Wu VLM MLLM 37 10 0 12 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives Vinodkumar Prabhakaran Christopher Homan Lora Aroyo Aida Mostafazadeh Davani Alicia Parrish Alex S. Taylor Mark Díaz Ding Wang Greg Serapio-García 47 9 0 09 Nov 2023
Factoring Hate Speech: A New Annotation Framework to Study Hate Speech in Social Media Gal Ron Effi Levi Odelia Oshri Shaul R. Shenhav 25 2 0 07 Nov 2023
Explainable Identification of Hate Speech towards Islam using Graph Neural Networks Azmine Toushik Wasi 33 0 0 02 Nov 2023
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning Yongjin Yang Joonkee Kim Yujin Kim Namgyu Ho James Thorne Se-Young Yun 27 21 0 01 Nov 2023
Text-Transport: Toward Learning Causal Effects of Natural Language Victoria Lin Louis-Philippe Morency Eli Ben-Michael 6 4 0 31 Oct 2023
On the Interplay between Fairness and Explainability Stephanie Brandl Emanuele Bugliarello Ilias Chalkidis FaML 27 4 0 25 Oct 2023
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings Chaewon Park Soohwan Kim Kyubyong Park Kunwoo Park 35 4 0 24 Oct 2023
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research Dimosthenis Antypas Asahi Ushio Francesco Barbieri Leonardo Neves Kiamehr Rezaee Luis Espinosa-Anke Jiaxin Pei Jose Camacho-Collados 35 9 0 23 Oct 2023
Probing LLMs for hate speech detection: strengths and vulnerabilities Sarthak Roy Ashish Harshavardhan Animesh Mukherjee Punyajoy Saha 63 33 0 19 Oct 2023
Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale Qichao Wang Tian Bian Yian Yin Tingyang Xu Hong Cheng Helen M. Meng Zibin Zheng Liang Chen Bingzhe Wu VLM DiffM 36 3 0 18 Oct 2023
VIBE: Topic-Driven Temporal Adaptation for Twitter Classification Yuji Zhang Jing Li Wenjie Li VLM 32 11 0 16 Oct 2023
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations Nils Feldhus Qianli Wang Tatiana Anikina Sahil Chopra Cennet Oguz Sebastian Möller 42 11 0 09 Oct 2023
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation Aman Khullar Daniel K. Nkemelu Cuong V. Nguyen Michael L. Best 45 2 0 04 Oct 2023
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation Wen Wu Wenlin Chen Chuxu Zhang P. Woodland 21 1 0 30 Sep 2023
Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Speech Detection Sarah Masud Ashutosh Bajpai Tanmoy Chakraborty 13 0 0 21 Sep 2023
Zero-Shot Robustification of Zero-Shot Models Dyah Adila Changho Shin Lin Cai Frederic Sala 51 19 0 08 Sep 2023
On the Challenges of Building Datasets for Hate Speech Detection Vitthal Bhandari 20 1 0 06 Sep 2023
Explainability for Large Language Models: A Survey Haiyan Zhao Hanjie Chen Fan Yang Ninghao Liu Huiqi Deng Hengyi Cai Shuaiqiang Wang Dawei Yin Mengnan Du LRM 36 415 0 02 Sep 2023
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis Nayeon Lee Chani Jung Jun-Hee Myung Jiho Jin Jose Camacho-Collados Juho Kim Alice Oh 52 14 0 31 Aug 2023
CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias Vipul Gupta Pranav Narayanan Venkit Hugo Laurenccon Shomir Wilson R. Passonneau 48 12 0 24 Aug 2023
A Survey on Fairness in Large Language Models Yingji Li Mengnan Du Rui Song Xin Wang Ying Wang ALM 57 60 0 20 Aug 2023
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software Wenxuan Wang Jingyuan Huang Jen-tse Huang Chang Chen Jiazhen Gu Pinjia He Michael R. Lyu VLM 36 6 0 18 Aug 2023
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models Ziyu Zhuang Qiguang Chen Longxuan Ma Mingda Li Yi Han Yushan Qian Haopeng Bai Zixian Feng Weinan Zhang Ting Liu ELM 31 9 0 15 Aug 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content Xinlei He Savvas Zannettou Yun Shen Yang Zhang CLL 29 37 0 10 Aug 2023
Causality Guided Disentanglement for Cross-Platform Hate Speech Detection Paras Sheth Tharindu Kumarage Raha Moraffah Amanat Chadha Huan Liu 34 8 0 03 Aug 2023
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution Ehsan Kamalloo A. Jafari Xinyu Crystina Zhang Nandan Thakur Jimmy J. Lin 32 42 0 31 Jul 2023
On the Learning Dynamics of Attention Networks Rahul Vashisht H. G. Ramaswamy 13 1 0 25 Jul 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies Jiangrui Zheng Xueqing Liu Guanqun Yang Mirazul Haque Xing Qian Ravishka Rathnasuriya Wei Yang G. Budhrani 52 3 0 23 Jul 2023
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media Liam Hebert Gaurav Sahu Yuxuan Guo Nanda Kishore Sreenivas Lukasz Golab Robin Cohen 23 10 0 18 Jul 2023
Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation Dimosthenis Antypas Jose Camacho-Collados 53 23 0 04 Jul 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety Lora Aroyo Alex S. Taylor Mark Díaz Christopher Homan Alicia Parrish Greg Serapio-García Vinodkumar Prabhakaran Ding Wang 34 33 0 20 Jun 2023
Cross-Domain Toxic Spans Detection Stefan F. Schouten Baran Barbarestani Wondimagegnhue Tufa Piek Vossen I. Markov 18 2 0 16 Jun 2023
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework Paras Sheth Tharindu Kumarage Raha Moraffah Amanat Chadha Huan Liu 36 7 0 15 Jun 2023
Strategies to exploit XAI to improve classification systems Andrea Apicella Luca Di Lorenzo Francesco Isgrò A. Pollastro R. Prevete 11 9 0 09 Jun 2023
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition Ali Modarressi Mohsen Fayyaz Ehsan Aghazadeh Yadollah Yaghoobzadeh Mohammad Taher Pilehvar 38 26 0 05 Jun 2023
Being Right for Whose Right Reasons? Terne Sasha Thorn Jakobsen Laura Cabello Anders Søgaard 39 10 0 01 Jun 2023
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models Pranath Reddy Kumbam Sohaib Uddin Syed Prashanth Thamminedi S. Harish Ian Perera Bonnie J. Dorr AAML 29 1 0 29 May 2023
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation H. Wang Ming Shan Hee Rabiul Awal K. T. W. Choo Roy Ka-wei Lee 24 42 0 28 May 2023
Detecting Multidimensional Political Incivility on Social Media Sagi Pendzel Nir Lotan Alon Zoizner Einat Minkov 19 1 0 24 May 2023