Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

5 September 2023

Papers citing "Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization"

6 / 6 papers shown

Title
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering Helena Bonaldi Greta Damo Nicolás Benjamín Ocampo Elena Cabrio S. Villata Marco Guerini 30 4 0 04 Oct 2024
NLP for Counterspeech against Hate: A Survey and How-To Guide Helena Bonaldi Yi-Ling Chung Gavin Abercrombie Marco Guerini AAML 31 13 0 29 Mar 2024
A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models Jaylen Jones Lingbo Mo Eric Fosler-Lussier Huan Sun 40 3 0 18 Feb 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
Deep Reinforcement Learning for Dialogue Generation Jiwei Li Will Monroe Alan Ritter Michel Galley Jianfeng Gao Dan Jurafsky 198 1,325 0 05 Jun 2016