HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

6 July 2025

Naquee Rizwan

Seid Muhie Yimam

Daryna Dementieva

Florian Skupin

Tim Fischer

Daniil Moskovskiy

Aarushi Ajay Borkar

Robert Geislinger

Punyajoy Saha

Sarthak Roy

Martin Semmann

Alexander Panchenko

Chris Biemann

Animesh Mukherjee

ArXiv (abs)PDF HTML Github

Main:9 Pages

8 Figures

Bibliography:5 Pages

1 Tables

Appendix:1 Pages

Abstract

Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing approaches primarily rely on reactive measures such as blocking or suspending offensive messages, with emerging strategies focusing on proactive measurements like detoxification and counterspeech. In our work, which we call HatePRISM, we conduct a comprehensive examination of hate speech regulations and strategies from three perspectives: country regulations, social platform policies, and NLP research datasets. Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions and platforms, alongside a lack of alignment with research efforts. Based on these insights, we suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation incorporating diverse strategies.

View on arXiv

Comments on this paper