Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2508.07063
Cited By
Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach
9 August 2025
Naseem Machlovi
Maryam Saleki
Innocent Ababio
Ruhul Amin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach"
2 / 2 papers shown
Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework
Mahmoud El-Bahnasawi
56
0
0
08 Nov 2025
Scaling behavior of large language models in emotional safety classification across sizes and tasks
Edoardo Pinzuti
Oliver Tüscher
André Ferreira Castro
AI4MH
117
0
0
02 Sep 2025
1