ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.07063
  4. Cited By
Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach

Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach

9 August 2025
Naseem Machlovi
Maryam Saleki
Innocent Ababio
Ruhul Amin
ArXiv (abs)PDFHTML

Papers citing "Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach"

2 / 2 papers shown
Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework
Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework
Mahmoud El-Bahnasawi
56
0
0
08 Nov 2025
Scaling behavior of large language models in emotional safety classification across sizes and tasks
Scaling behavior of large language models in emotional safety classification across sizes and tasks
Edoardo Pinzuti
Oliver Tüscher
André Ferreira Castro
AI4MH
117
0
0
02 Sep 2025
1