All Papers

0 / 0 papers shown

Title

Decoding Hate: Exploring Language Models' Reactions to Hate Speech

Decoding Hate: Exploring Language Models' Reactions to Hate Speech

North American Chapter of the Association for Computational Linguistics (NAACL), 2024

1 October 2024

ArXiv (abs)PDF HTML

Papers citing "Decoding Hate: Exploring Language Models' Reactions to Hate Speech"

5 / 5 papers shown

Title
Evaluating Large Language Models for Detecting Antisemitism Jay Patel Hrudayangam Mehta Jeremy Blackburn 139 0 0 22 Sep 2025
WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data Paloma Piot Diego Sánchez Javier Parapar 84 0 0 01 Sep 2025
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate Mikel K. Ngueajio Flor Miriam Plaza del Arco Yi-Ling Chung D. Rawat Amanda Cercas Curry 139 1 0 04 Jun 2025
Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models Paloma Piot Patricia Martín-Rodilla Javier Parapar 182 0 0 04 May 2025
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety Paul Röttger Fabio Pernisi Bertie Vidgen Dirk Hovy ELM KELM 304 58 0 08 Apr 2024