Improving Harmful Text Detection with Joint Retrieval and External Knowledge

3 April 2025

Abstract

Harmful text detection has become a crucial task in the development and deployment of large language models, especially as AI-generated content continues to expand across digital platforms. This study proposes a joint retrieval framework that integrates pre-trained language models with knowledge graphs to improve the accuracy and robustness of harmful text detection. Experimental results demonstrate that the joint retrieval approach significantly outperforms single-model baselines, particularly in low-resource training scenarios and multilingual environments. The proposed method effectively captures nuanced harmful content by leveraging external contextual information, addressing the limitations of traditional detection models. Future research should focus on optimizing computational efficiency, enhancing model interpretability, and expanding multimodal detection capabilities to better tackle evolving harmful content patterns. This work contributes to the advancement of AI safety, ensuring more trustworthy and reliable content moderation systems.

View on arXiv

@article{yu2025_2504.02310,
  title={ Improving Harmful Text Detection with Joint Retrieval and External Knowledge },
  author={ Zidong Yu and Shuo Wang and Nan Jiang and Weiqiang Huang and Xu Han and Junliang Du },
  journal={arXiv preprint arXiv:2504.02310},
  year={ 2025 }
}

Comments on this paper