Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech

1 September 2021

Papers citing "Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech"

14 / 14 papers shown

Title
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns Xinyue Shen Yixin Wu Y. Qu Michael Backes Savvas Zannettou Yang Zhang 120 7 0 28 Jan 2025
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection Camilla Casula Sara Tonelli 54 1 0 10 Oct 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information Zheng Hui Zhaoxiao Guo Hang Zhao Juanyong Duan Congrui Huang 148 7 0 23 Sep 2024
Shortchanged: Uncovering and Analyzing Intimate Partner Financial Abuse in Consumer Complaints Arkaprabha Bhattacharya Kevin Lee Vineeth Ravi Jessica Staddon Rosanna Bellini 23 2 0 20 Mar 2024
Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge Shi Yin Hong Susan Gauch 62 2 0 24 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings Sagi Pendzel Tomer Wullach Amir Adler Einat Minkov 60 11 0 16 Nov 2023
Simple synthetic data reduces sycophancy in large language models Jerry W. Wei Da Huang Yifeng Lu Denny Zhou Quoc V. Le 114 74 0 07 Aug 2023
Detecting Multidimensional Political Incivility on Social Media Sagi Pendzel Nir Lotan Alon Zoizner Einat Minkov 29 1 0 24 May 2023
Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection Rabiul Awal Roy Ka-wei Lee Eshaan Tanwar Tanmay Garg Tanmoy Chakraborty 73 28 0 04 Mar 2023
State-of-the-art generalisation research in NLP: A taxonomy and review Dieuwke Hupkes Mario Giulianelli Verna Dankers Mikel Artetxe Yanai Elazar ... Leila Khalatbari Maria Ryskina Rita Frieske Ryan Cotterell Zhijing Jin 270 99 0 06 Oct 2022
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice Mohit Singhal Chen Ling Pujan Paudel Poojitha Thota Nihal Kumarswamy Gianluca Stringhini Shirin Nilizadeh 156 33 0 29 Jun 2022
CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection Souvic Chakraborty Parag Dutta Sumegh Roychowdhury Animesh Mukherjee 31 8 0 13 Apr 2022
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab Abraham Israeli Oren Tsur 80 1 0 27 Jan 2022
Character-level HyperNetworks for Hate Speech Detection Tomer Wullach A. Adler Einat Minkov 61 14 0 11 Nov 2021