Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

9 May 2022

Papers citing "Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection"

8 / 8 papers shown

Title
NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers Salvatore Greco Ke Zhou L. Capra Tania Cerquitelli Daniele Quercia 29 2 0 01 Jul 2024
Hate Speech Detection with Generalizable Target-aware Fairness Tong Chen Danny Wang Xurong Liang Marten Risius Gianluca Demartini Hongzhi Yin 19 3 0 28 May 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias Yuemei Xu Ling Hu Jiayi Zhao Zihan Qiu Yuqi Ye Hanwen Gu LRM 19 36 0 01 Apr 2024
Unlock the Potential of Counterfactually-Augmented Data in Out-Of-Distribution Generalization Caoyun Fan Wenqing Chen Jidong Tian Yitian Li Hao He Yaohui Jin CML 15 4 0 10 Oct 2023
On the Challenges of Building Datasets for Hate Speech Detection Vitthal Bhandari 6 1 0 06 Sep 2023
Hate Speech Detection via Dual Contrastive Learning Junyu Lu Ho-Yi Lin Xiaokun Zhang Zhaoqing Li Tongyue Zhang Linlin Zong Fenglong Ma Bo Xu 15 17 0 10 Jul 2023
Causal Effect Regularization: Automated Detection and Removal of Spurious Attributes Abhinav Kumar Amit Deshpande Ajay Sharma CML 9 1 0 19 Jun 2023
Probing Classifiers are Unreliable for Concept Removal and Detection Abhinav Kumar Chenhao Tan Amit Sharma AAML 13 20 0 08 Jul 2022