Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2508.07063
Cited By

Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach

Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach

9 August 2025

Naseem Machlovi

Innocent Ababio

ArXiv (abs)PDF HTML

Papers citing "Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach"

2 / 2 papers shown

Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework

Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework

Mahmoud El-Bahnasawi

56

0

0

08 Nov 2025

Scaling behavior of large language models in emotional safety classification across sizes and tasks

Scaling behavior of large language models in emotional safety classification across sizes and tasks

Edoardo Pinzuti

Oliver Tüscher

André Ferreira Castro

117

0

0

02 Sep 2025