Towards Interpretable Hate Speech Detection using Large Language
Model-extracted Rationales

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

19 March 2024

Amrita Bhattacharjee

Huan Liu

Papers citing "Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales"

9 / 9 papers shown

Title
Scalable Evaluation of Online Moderation Strategies via Synthetic Simulations Dimitris Tsirmpas Ion Androutsopoulos John Pavlopoulos 34 0 0 13 Mar 2025
EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces G. Mujtaba Sunder Ali Khowaja K. Dev 38 0 0 28 Feb 2025
Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization Anwar Hossain Zahid Monoshi Kumar Roy Swarna Das 60 1 0 26 Feb 2025
Towards Efficient and Explainable Hate Speech Detection via Model Distillation Paloma Piot Javier Parapar 72 173 0 18 Dec 2024
AggregHate: An Efficient Aggregative Approach for the Detection of Hatemongers on Social Platforms Tom Marzea Abraham Israeli Oren Tsur 16 0 0 22 Sep 2024
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement Paras Sheth Tharindu Kumarage Raha Moraffah Amanat Chadha Huan Liu 24 1 0 17 Apr 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 248 2,232 0 22 Mar 2023
Deep Learning for Hate Speech Detection: A Comparative Study Jitendra Malik Hezhe Qiao Guansong Pang A. Hengel 35 43 0 19 Feb 2022
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 406 2,584 0 03 Sep 2019