Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering

4 October 2024

Papers citing "Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering"

1 / 1 papers shown

Title
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Gabriel Chua Shing Yee Chan Shaun Khoo 75 1 0 20 Nov 2024