Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.10289
Cited By
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
18 December 2020
Binny Mathew
Punyajoy Saha
Seid Muhie Yimam
Chris Biemann
Pawan Goyal
Animesh Mukherjee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection"
50 / 280 papers shown
Title
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs
S. Kadhe
Anisa Halimi
Ambrish Rawat
Nathalie Baracaldo
MU
22
7
0
12 Dec 2023
Toxic language detection: a systematic review of Arabic datasets
Imene Bensalem
Paolo Rosso
Hanane Zitouni
32
4
0
12 Dec 2023
A Text-to-Text Model for Multilingual Offensive Language Identification
Tharindu Ranasinghe
Marcos Zampieri
27
3
0
06 Dec 2023
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
Randall Balestriero
Romain Cosentino
Sarath Shekkizhar
28
2
0
04 Dec 2023
Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge
Shi Yin Hong
Susan Gauch
40
2
0
24 Nov 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
47
0
0
16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
33
11
0
16 Nov 2023
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection
Sarah Masud
Mohammad Aflah Khan
Md. Shad Akhtar
Tanmoy Chakraborty
32
3
0
16 Nov 2023
The Uli Dataset: An Exercise in Experience Led Annotation of oGBV
Arnav Arora
Maha Jinadoss
Cheshta Arora
Denny George
Brindaalakshmi
...
Ambika Tandon
Rishav Thakker
Rahul Dev Korra
Aatman Vaidya
Tarunima Prabhakar
26
1
0
15 Nov 2023
Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models
Carlos Alejandro Aguirre
Kuleen Sasse
Isabel Cachola
Mark Dredze
32
1
0
14 Nov 2023
Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model
Minh-Hao Van
Xintao Wu
VLM
MLLM
37
10
0
12 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
47
9
0
09 Nov 2023
Factoring Hate Speech: A New Annotation Framework to Study Hate Speech in Social Media
Gal Ron
Effi Levi
Odelia Oshri
Shaul R. Shenhav
25
2
0
07 Nov 2023
Explainable Identification of Hate Speech towards Islam using Graph Neural Networks
Azmine Toushik Wasi
33
0
0
02 Nov 2023
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
Yongjin Yang
Joonkee Kim
Yujin Kim
Namgyu Ho
James Thorne
Se-Young Yun
27
21
0
01 Nov 2023
Text-Transport: Toward Learning Causal Effects of Natural Language
Victoria Lin
Louis-Philippe Morency
Eli Ben-Michael
6
4
0
31 Oct 2023
On the Interplay between Fairness and Explainability
Stephanie Brandl
Emanuele Bugliarello
Ilias Chalkidis
FaML
27
4
0
25 Oct 2023
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings
Chaewon Park
Soohwan Kim
Kyubyong Park
Kunwoo Park
35
4
0
24 Oct 2023
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Dimosthenis Antypas
Asahi Ushio
Francesco Barbieri
Leonardo Neves
Kiamehr Rezaee
Luis Espinosa-Anke
Jiaxin Pei
Jose Camacho-Collados
35
9
0
23 Oct 2023
Probing LLMs for hate speech detection: strengths and vulnerabilities
Sarthak Roy
Ashish Harshavardhan
Animesh Mukherjee
Punyajoy Saha
63
33
0
19 Oct 2023
Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale
Qichao Wang
Tian Bian
Yian Yin
Tingyang Xu
Hong Cheng
Helen M. Meng
Zibin Zheng
Liang Chen
Bingzhe Wu
VLM
DiffM
36
3
0
18 Oct 2023
VIBE: Topic-Driven Temporal Adaptation for Twitter Classification
Yuji Zhang
Jing Li
Wenjie Li
VLM
32
11
0
16 Oct 2023
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Nils Feldhus
Qianli Wang
Tatiana Anikina
Sahil Chopra
Cennet Oguz
Sebastian Möller
42
11
0
09 Oct 2023
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation
Aman Khullar
Daniel K. Nkemelu
Cuong V. Nguyen
Michael L. Best
45
2
0
04 Oct 2023
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation
Wen Wu
Wenlin Chen
Chuxu Zhang
P. Woodland
21
1
0
30 Sep 2023
Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Speech Detection
Sarah Masud
Ashutosh Bajpai
Tanmoy Chakraborty
13
0
0
21 Sep 2023
Zero-Shot Robustification of Zero-Shot Models
Dyah Adila
Changho Shin
Lin Cai
Frederic Sala
51
19
0
08 Sep 2023
On the Challenges of Building Datasets for Hate Speech Detection
Vitthal Bhandari
20
1
0
06 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Mengnan Du
LRM
36
415
0
02 Sep 2023
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis
Nayeon Lee
Chani Jung
Jun-Hee Myung
Jiho Jin
Jose Camacho-Collados
Juho Kim
Alice Oh
52
14
0
31 Aug 2023
CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias
Vipul Gupta
Pranav Narayanan Venkit
Hugo Laurenccon
Shomir Wilson
R. Passonneau
48
12
0
24 Aug 2023
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
57
60
0
20 Aug 2023
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Wenxuan Wang
Jingyuan Huang
Jen-tse Huang
Chang Chen
Jiazhen Gu
Pinjia He
Michael R. Lyu
VLM
36
6
0
18 Aug 2023
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
Ziyu Zhuang
Qiguang Chen
Longxuan Ma
Mingda Li
Yi Han
Yushan Qian
Haopeng Bai
Zixian Feng
Weinan Zhang
Ting Liu
ELM
31
9
0
15 Aug 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content
Xinlei He
Savvas Zannettou
Yun Shen
Yang Zhang
CLL
29
37
0
10 Aug 2023
Causality Guided Disentanglement for Cross-Platform Hate Speech Detection
Paras Sheth
Tharindu Kumarage
Raha Moraffah
Amanat Chadha
Huan Liu
34
8
0
03 Aug 2023
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
Ehsan Kamalloo
A. Jafari
Xinyu Crystina Zhang
Nandan Thakur
Jimmy J. Lin
32
42
0
31 Jul 2023
On the Learning Dynamics of Attention Networks
Rahul Vashisht
H. G. Ramaswamy
13
1
0
25 Jul 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
52
3
0
23 Jul 2023
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media
Liam Hebert
Gaurav Sahu
Yuxuan Guo
Nanda Kishore Sreenivas
Lukasz Golab
Robin Cohen
23
10
0
18 Jul 2023
Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation
Dimosthenis Antypas
Jose Camacho-Collados
53
23
0
04 Jul 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety
Lora Aroyo
Alex S. Taylor
Mark Díaz
Christopher Homan
Alicia Parrish
Greg Serapio-García
Vinodkumar Prabhakaran
Ding Wang
34
33
0
20 Jun 2023
Cross-Domain Toxic Spans Detection
Stefan F. Schouten
Baran Barbarestani
Wondimagegnhue Tufa
Piek Vossen
I. Markov
18
2
0
16 Jun 2023
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework
Paras Sheth
Tharindu Kumarage
Raha Moraffah
Amanat Chadha
Huan Liu
36
7
0
15 Jun 2023
Strategies to exploit XAI to improve classification systems
Andrea Apicella
Luca Di Lorenzo
Francesco Isgrò
A. Pollastro
R. Prevete
11
9
0
09 Jun 2023
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition
Ali Modarressi
Mohsen Fayyaz
Ehsan Aghazadeh
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
38
26
0
05 Jun 2023
Being Right for Whose Right Reasons?
Terne Sasha Thorn Jakobsen
Laura Cabello
Anders Søgaard
39
10
0
01 Jun 2023
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models
Pranath Reddy Kumbam
Sohaib Uddin Syed
Prashanth Thamminedi
S. Harish
Ian Perera
Bonnie J. Dorr
AAML
29
1
0
29 May 2023
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
H. Wang
Ming Shan Hee
Rabiul Awal
K. T. W. Choo
Roy Ka-wei Lee
24
42
0
28 May 2023
Detecting Multidimensional Political Incivility on Social Media
Sagi Pendzel
Nir Lotan
Alon Zoizner
Einat Minkov
19
1
0
24 May 2023
Previous
1
2
3
4
5
6
Next