Automated Hate Speech Detection and the Problem of Offensive Language

Automated Hate Speech Detection and the Problem of Offensive Language

11 March 2017

Thomas Davidson

Papers citing "Automated Hate Speech Detection and the Problem of Offensive Language"

16 / 16 papers shown

Title
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection Yejin Lee Joonghyuk Hahn Hyeseon Ahn Yo-Sub Han 32 0 0 26 May 2025
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs Muhammad Farid Adilazuarda Chen Cecilia Liu Iryna Gurevych Alham Fikri Aji 80 0 0 22 May 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation Shiza Ali Jeremy Blackburn Gianluca Stringhini 73 0 0 24 Feb 2025
Echoes of Discord: Forecasting Hater Reactions to Counterspeech Xiaoying Song Sharon Lisseth Perez Xinchen Yu Eduardo Blanco Lingzi Hong 353 0 0 17 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet Berk Atil Vipul Gupta Sarkar Snigdha Sarathi Das R. Passonneau 324 0 0 07 Feb 2025
TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection Yang Cao Sikun Yang Chen Li Haolong Xiang Lianyong Qi Bo Liu Rongsheng Li Ming Liu 71 0 0 21 Jan 2025
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition? Urja Khurana Eric T. Nalisnick Antske Fokkens 81 1 0 21 Oct 2024
SLM-Mod: Small Language Models Surpass LLMs at Content Moderation Xianyang Zhan Agam Goyal Yilun Chen Eshwar Chandrasekharan Koustuv Saha AI4MH 340 3 0 17 Oct 2024
TaeBench: Improving Quality of Toxic Adversarial Examples Xuan Zhu Dmitriy Bespalov Liwen You Ninad Kulkarni Yanjun Qi AAML 76 0 0 08 Oct 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information Zheng Hui Zhaoxiao Guo Hang Zhao Juanyong Duan Congrui Huang 68 7 0 23 Sep 2024
Identity-related Speech Suppression in Generative AI Content Moderation Oghenefejiro Isaacs Anigboro Charlie M. Crawford Danaë Metaxa Sorelle A. Friedler Sorelle A. Friedler 56 0 0 09 Sep 2024
A Causal Framework for Evaluating Deferring Systems Filippo Palomba Andrea Pugnana Jose M. Alvarez Salvatore Ruggieri CML 69 4 0 29 May 2024
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets Manuel Tonneau Diyi Liu Samuel Fraiberger Ralph Schroeder Scott A. Hale Paul Röttger 44 6 0 27 Apr 2024
Specification Overfitting in Artificial Intelligence Benjamin Roth Pedro Henrique Luz de Araujo Yuxi Xia Saskia Kaltenbrunner Christoph Korab 117 1 0 13 Mar 2024
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection Ahmad Nasir Aadish Sharma Kokil Jaidka Saifuddin Ahmed 51 3 0 29 Oct 2023
AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection Geet Shingi Vedangi Wagh 88 0 0 10 Dec 2021