Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation

The proliferation of social media platforms has led to an increase in the spread of hate speech, particularly targeting vulnerable communities. Unfortunately, existing methods for automatically identifying and blocking toxic language rely on pre-constructed lexicons, making them reactive rather than adaptive. As such, these approaches become less effective over time, especially when new communities are targeted with slurs not included in the original datasets. To address this issue, we present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. This approach can effectively detect toxic language, including intentional spelling mistakes employed by aggressors to avoid detection. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets. Our work has significant implications for creating safer online environments by improving the detection of toxic content and proactively updating the lexicon. Content Warning: This paper contains examples of hate speech that may be triggering.
View on arXiv@article{ali2025_2502.10921, title={ Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation }, author={ Shiza Ali and Jeremy Blackburn and Gianluca Stringhini }, journal={arXiv preprint arXiv:2502.10921}, year={ 2025 } }