ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.10430
25
3

NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models

18 March 2023
Yiran Ye
Thai Le
Dongwon Lee
    AAML
    DeLMO
ArXivPDFHTML
Abstract

Online texts with toxic content are a clear threat to the users on social media in particular and society in general. Although many platforms have adopted various measures (e.g., machine learning-based hate-speech detection systems) to diminish their effect, toxic content writers have also attempted to evade such measures by using cleverly modified toxic words, so-called human-written text perturbations. Therefore, to help build automatic detection tools to recognize those perturbations, prior methods have developed sophisticated techniques to generate diverse adversarial samples. However, we note that these ``algorithms"-generated perturbations do not necessarily capture all the traits of ``human"-written perturbations. Therefore, in this paper, we introduce a novel, high-quality dataset of human-written perturbations, named as NoisyHate, that was created from real-life perturbations that are both written and verified by human-in-the-loop. We show that perturbations in NoisyHate have different characteristics than prior algorithm-generated toxic datasets show, and thus can be in particular useful to help develop better toxic speech detection solutions. We thoroughly validate NoisyHate against state-of-the-art language models, such as BERT and RoBERTa, and black box APIs, such as Perspective API, on two tasks, such as perturbation normalization and understanding.

View on arXiv
@article{ye2025_2303.10430,
  title={ NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models },
  author={ Yiran Ye and Thai Le and Dongwon Lee },
  journal={arXiv preprint arXiv:2303.10430},
  year={ 2025 }
}
Comments on this paper