ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.02439
  4. Cited By
Contextualizing Hate Speech Classifiers with Post-hoc Explanation
v1v2v3 (latest)

Contextualizing Hate Speech Classifiers with Post-hoc Explanation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
5 May 2020
Brendan Kennedy
Xisen Jin
Aida Mostafazadeh Davani
Morteza Dehghani
Xiang Ren
ArXiv (abs)PDFHTML

Papers citing "Contextualizing Hate Speech Classifiers with Post-hoc Explanation"

50 / 83 papers shown
Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
Dzmitry Pihulski
Jan Kocoń
150
0
0
27 Sep 2025
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Yifan Wang
Mayank Jobanputra
Ji-Ung Lee
Soyoung Oh
Isabel Valera
Vera Demberg
271
1
0
26 Sep 2025
MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Multi-hop Hate Speech Explanation
MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Multi-hop Hate Speech Explanation
Jackson Trager
Diego Alves
Matteo Guida
Mikel K. Ngueajio
Mikel K. Ngueajio
Flor Miriam Plaza del Arco
Yalda Daryanai
Farzan Karimi-Malekabadi
Francielle Vargas
LRM
316
0
0
23 Jun 2025
Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models
Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models
Valerie Krug
Sebastian Stober
313
0
0
04 Jun 2025
Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections
Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage CollectionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Orfeas Menis Mastromichalakis
Jason Liartis
Kristina Rose
Antoine Isaac
Giorgos Stamou
KELM
190
0
0
30 May 2025
On Fairness of Task Arithmetic: The Role of Task Vectors
On Fairness of Task Arithmetic: The Role of Task Vectors
Hiroki Naganuma
Kotaro Yoshida
Laura Gomezjurado Gonzalez
Takafumi Horie
Yuji Naraki
Ryotaro Shimizu
MoMe
252
3
0
30 May 2025
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data
Faeze Ghorbanpour
Daryna Dementieva
Kangyang Luo
392
0
0
20 May 2025
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Vasiliki Papanikou
Danae Pla Karidi
E. Pitoura
Emmanouil Panagiotou
Eirini Ntoutsi
523
2
0
01 May 2025
U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
Jiaxin Song
Xinyu Wang
Yihao Wang
Yifan Tang
Ru Zhang
Jianyi Liu
Gongshen Liu
AAML
291
2
0
03 Jan 2025
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain
Vikram Krishnamurthy
LLMAG
566
0
0
02 Nov 2024
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection
Camilla Casula
Sara Tonelli
292
0
0
10 Oct 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
455
16
0
11 Jul 2024
Hate Speech Detection with Generalizable Target-aware Fairness
Hate Speech Detection with Generalizable Target-aware Fairness
Tong Chen
Danny Wang
Xurong Liang
Marten Risius
Gianluca Demartini
Hongzhi Yin
470
13
0
28 May 2024
Exploring Boundaries and Intensities in Offensive and Hate Speech:
  Unveiling the Complex Spectrum of Social Media Discourse
Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse
Abinew Ali Ayele
Esubalew alemneh Jalew
Adem Chanie Ali
Seid Muhie Yimam
Christian Biemann
194
6
0
18 Apr 2024
ToXCL: A Unified Framework for Toxic Speech Detection and Explanation
ToXCL: A Unified Framework for Toxic Speech Detection and Explanation
Nhat M. Hoang
Do Xuan Long
Duc Anh Do
Duc Anh Vu
Anh Tuan Luu
845
12
0
25 Mar 2024
Recourse for reclamation: Chatting with generative language models
Recourse for reclamation: Chatting with generative language models
Jennifer Chien
Kevin R. McKee
Jackie Kay
William S. Isaac
229
0
0
21 Mar 2024
Recent Advances in Hate Speech Moderation: Multimodality and the Role of
  Large Models
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models
Ming Shan Hee
Shivam Sharma
Rui Cao
Palash Nandi
Tanmoy Chakraborty
Roy Ka-wei Lee
251
4
0
30 Jan 2024
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Aiqi Jiang
A. Zubiaga
AAML
385
7
0
17 Jan 2024
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A
  Hate Speech Detection Case Study
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
283
0
0
16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
218
16
0
16 Nov 2023
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens
  Contributing to Explicit Hate in English by Span Detection
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection
Sarah Masud
Mohammad Aflah Khan
Md. Shad Akhtar
Tanmoy Chakraborty
262
6
0
16 Nov 2023
REFER: An End-to-end Rationale Extraction Framework for Explanation
  Regularization
REFER: An End-to-end Rationale Extraction Framework for Explanation RegularizationConference on Computational Natural Language Learning (CoNLL), 2023
Mohammad Reza Ghasemi Madani
Pasquale Minervini
275
5
0
22 Oct 2023
Towards a Unified Framework for Adaptable Problematic Content Detection
  via Continual Learning
Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning
Ali Omrani
Alireza S. Ziabari
Preni Golazizian
Jeffery Sorensen
Morteza Dehghani
278
2
0
29 Sep 2023
Hateful Messages: A Conversational Data Set of Hate Speech produced by
  Adolescents on Discord
Hateful Messages: A Conversational Data Set of Hate Speech produced by Adolescents on Discord
Jan Fillies
Silvio Peikert
Adrian Paschke
151
7
0
04 Sep 2023
Unmasking Nationality Bias: A Study of Human Perception of Nationalities
  in AI-Generated Articles
Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated ArticlesAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
Pranav Narayanan Venkit
Sanjana Gautam
Ruchi Panchanadikar
Tingting Huang
Shomir Wilson
188
31
0
08 Aug 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALMELMAILaw
467
298
0
02 Aug 2023
Sociodemographic Bias in Language Models: A Survey and Forward Path
Sociodemographic Bias in Language Models: A Survey and Forward Path
Vipul Gupta
Pranav Narayanan Venkit
Shomir Wilson
R. Passonneau
534
34
0
13 Jun 2023
Evaluating the Effectiveness of Natural Language Inference for Hate
  Speech Detection in Languages with Limited Labeled Data
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data
Janis Goldzycher
Moritz Preisig
Chantal Amrhein
Gerold Schneider
242
4
0
06 Jun 2023
Exploiting Explainability to Design Adversarial Attacks and Evaluate
  Attack Resilience in Hate-Speech Detection Models
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection ModelsInternational Conference on Web and Social Media (ICWSM), 2023
Pranath Reddy Kumbam
Sohaib Uddin Syed
Prashanth Thamminedi
S. Harish
Ian Perera
Bonnie J. Dorr
AAML
197
3
0
29 May 2023
Should We Attend More or Less? Modulating Attention for Fairness
Should We Attend More or Less? Modulating Attention for Fairness
A. Zayed
Gonçalo Mordido
Samira Shabanian
Sarath Chandar
341
16
0
22 May 2023
Analyzing Norm Violations in Live-Stream Chat
Analyzing Norm Violations in Live-Stream ChatConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jihyung Moon
Dong-Ho Lee
Hyundong Justin Cho
Woojeong Jin
Chan Young Park
MinWoo Kim
Jonathan May
Jay Pujara
Sungjoon Park
310
9
0
18 May 2023
HateMM: A Multi-Modal Dataset for Hate Video Classification
HateMM: A Multi-Modal Dataset for Hate Video ClassificationInternational Conference on Web and Social Media (ICWSM), 2023
Mithun Das
R. Raj
Punyajoy Saha
Binny Mathew
Manish Gupta
Animesh Mukherjee
240
67
0
06 May 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
SemEval-2023 Task 10: Explainable Detection of Online SexismInternational Workshop on Semantic Evaluation (SemEval), 2023
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
347
148
0
07 Mar 2023
Explaining text classifiers through progressive neighborhood
  approximation with realistic samples
Explaining text classifiers through progressive neighborhood approximation with realistic samples
Yi Cai
Arthur Zimek
Eirini Ntoutsi
Gerhard Wunder
AI4TS
233
1
0
11 Feb 2023
Nationality Bias in Text Generation
Nationality Bias in Text GenerationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Pranav Narayanan Venkit
Sanjana Gautam
Ruchi Panchanadikar
Ting-Hao 'Kenneth' Huang
Shomir Wilson
460
76
0
05 Feb 2023
Language Model Detoxification in Dialogue with Contextualized Stance
  Control
Language Model Detoxification in Dialogue with Contextualized Stance ControlConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jingu Qian
Xifeng Yan
232
3
0
25 Jan 2023
XMD: An End-to-End Framework for Interactive Explanation-Based Debugging
  of NLP Models
XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Dong-Ho Lee
Akshen Kadakia
Brihi Joshi
Aaron Chan
Ziyi Liu
...
Takashi Shibuya
Ryosuke Mitani
Toshiyuki Sekiya
Jay Pujara
Xiang Ren
LRM
249
11
0
30 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between
  Languages for Zero-Shot Transfer of Hate Speech Detection Models
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
364
18
0
24 Oct 2022
TCAB: A Large-Scale Text Classification Attack Benchmark
TCAB: A Large-Scale Text Classification Attack Benchmark
Kalyani Asthana
Zhouhang Xie
Wencong You
Adam Noack
Jonathan Brophy
Sameer Singh
Daniel Lowd
335
3
0
21 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into
  Under-Resourced Languages
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
236
13
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
  Classifier Uses Sentiment Information
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment InformationBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
222
5
0
19 Oct 2022
Assessing Out-of-Domain Language Model Performance from Few Examples
Assessing Out-of-Domain Language Model Performance from Few ExamplesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Prasann Singhal
Jarad Forristal
Xi Ye
Greg Durrett
LRM
231
6
0
13 Oct 2022
From Mimicking to Integrating: Knowledge Integration for Pre-Trained
  Language Models
From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
VLM
199
2
0
11 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Explainable Abuse Detection as Intent Classification and Slot FillingTransactions of the Association for Computational Linguistics (TACL), 2022
Agostina Calabrese
Bjorn Ross
Mirella Lapata
260
13
0
06 Oct 2022
Domain Classification-based Source-specific Term Penalization for Domain
  Adaptation in Hate-speech Detection
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech DetectionInternational Conference on Computational Linguistics (COLING), 2022
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
301
1
0
18 Sep 2022
Power of Explanations: Towards automatic debiasing in hate speech
  detection
Power of Explanations: Towards automatic debiasing in hate speech detectionInternational Conference on Data Science and Advanced Analytics (DSAA), 2022
Yitao Cai
Arthur Zimek
Gerhard Wunder
Eirini Ntoutsi
181
9
0
07 Sep 2022
VisFIS: Visual Feature Importance Supervision with
  Right-for-the-Right-Reason Objectives
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason ObjectivesNeural Information Processing Systems (NeurIPS), 2022
Zhuofan Ying
Peter Hase
Joey Tianyi Zhou
LRM
335
15
0
22 Jun 2022
Enriching Abusive Language Detection with Community Context
Enriching Abusive Language Detection with Community Context
Jana Kurrek
Haji Mohammad Saleem
D. Ruths
209
6
0
16 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of
  NLP Models
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
335
42
0
08 Jun 2022
ER-Test: Evaluating Explanation Regularization Methods for Language
  Models
ER-Test: Evaluating Explanation Regularization Methods for Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Brihi Joshi
Aaron Chan
Ziyi Liu
Shaoliang Nie
Maziar Sanjabi
Hamed Firooz
Xiang Ren
AAML
415
7
0
25 May 2022
12
Next
Page 1 of 2