Handling Bias in Toxic Speech Detection: A Survey

26 January 2022

Papers citing "Handling Bias in Toxic Speech Detection: A Survey"

46 / 46 papers shown

Title
Tackling Social Bias against the Poor: A Dataset and Taxonomy on Aporophobia Georgina Curto S. Kiritchenko Muhammad Hammad Fahim Siddiqui I. Nejadgholi Kathleen C. Fraser 26 0 0 17 Apr 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models Aryan Shrivastava Paula Akemi Aoyagui 29 0 0 14 Apr 2025
Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection Sergey Berezin R. Farahbakhsh Noel Crespi 53 0 0 20 Mar 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations David Hartmann Amin Oueslati Dimitri Staufer Lena Pohlmann Simon Munzert Hendrik Heuer 50 0 0 03 Mar 2025
Safe Spaces or Toxic Places? Content Moderation and Social Dynamics of Online Eating Disorder Communities Kristina Lerman Minh Duc Hoang Chu Charles Bickham Luca Luceri Emilio Ferrara AI4MH 85 0 0 20 Dec 2024
ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data? Zheng Hui Zhaoxiao Guo Hang Zhao Juanyong Duan Lin Ai Yinheng Li Julia Hirschberg Congrui Huang 85 1 0 18 Nov 2024
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language Xinmeng Hou 24 1 0 17 Oct 2024
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets Tommaso Giorgi Lorenzo Cima T. Fagni M. Avvenuti S. Cresci 42 9 0 10 Oct 2024
Hate Personified: Investigating the role of LLMs in content moderation Sarah Masud Sahajpreet Singh Viktor Hangya Alexander Fraser Tanmoy Chakraborty 30 7 0 03 Oct 2024
Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity Johannes Schneider Arianna Casanova Flores Anne-Catherine Kranz 50 2 0 08 Jul 2024
Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services David Hartmann Amin Oueslati Dimitri Staufer MLAU 35 1 0 20 Jun 2024
Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology Federico Ruggeri Eleonora Misino Arianna Muti Katerina Korre Paolo Torroni Alberto Barrón-Cedeño 39 0 0 20 Jun 2024
Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities Delfina Sol Martinez Pandiani Erik Tjong Kim Sang Davide Ceolin 29 2 0 11 Jun 2024
Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech Neemesh Yadav Sarah Masud Vikram Goyal Vikram Goyal Md. Shad Akhtar Tanmoy Chakraborty 28 3 0 06 Jun 2024
Hate Speech Detection with Generalizable Target-aware Fairness Tong Chen Danny Wang Xurong Liang Marten Risius Gianluca Demartini Hongzhi Yin 35 3 0 28 May 2024
FUGNN: Harmonizing Fairness and Utility in Graph Neural Networks Renqiang Luo Huafei Huang Shuo Yu Zhuoyang Han Estrid He Xiuzhen Zhang Feng Xia 34 3 0 27 May 2024
Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models Paula Akemi Aoyagui Sharon Ferguson Anastasia Kuzminykh 50 0 0 17 May 2024
Algorithmic Fairness: A Tolerance Perspective Renqiang Luo Tao Tang Feng Xia Jiaying Liu Chengpei Xu Leo Yu Zhang Wei Xiang Chengqi Zhang FaML 74 0 0 26 Apr 2024
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps Kristina Gligorić Myra Cheng Lucia Zheng Esin Durmus Dan Jurafsky 45 9 0 02 Apr 2024
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models Hyunbyung Park Sukyung Lee Gyoungjin Gim Yungi Kim Dahyun Kim Chanjun Park VLM 36 0 0 28 Mar 2024
Legally Binding but Unfair? Towards Assessing Fairness of Privacy Policies Vincent Freiberger Erik Buchmann AILaw 32 5 0 12 Mar 2024
Don't Blame the Data, Blame the Model: Understanding Noise and Bias When Learning from Subjective Annotations Abhishek Anand Negar Mokhberian Prathyusha Naresh Kumar Anweasha Saha Zihao He Ashwin Rao Fred Morstatter Kristina Lerman 36 6 0 06 Mar 2024
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges Aiqi Jiang A. Zubiaga AAML 31 3 0 17 Jan 2024
Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates Aida Mostafazadeh Davani Mark Díaz Dylan K. Baker Vinodkumar Prabhakaran AAML 25 14 0 11 Dec 2023
LifeTox: Unveiling Implicit Toxicity in Life Advice Minbeom Kim Jahyun Koo Hwanhee Lee Joonsuk Park Hwaran Lee Kyomin Jung 13 6 0 16 Nov 2023
A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity Wenbo Zhang Hangzhi Guo Ian D Kivlichan Vinodkumar Prabhakaran Davis Yadav Amulya Yadav 23 2 0 07 Nov 2023
On the definition of toxicity in NLP Sergey Berezin R. Farahbakhsh Noel Crespi 21 0 0 03 Oct 2023
Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Speech Detection Sarah Masud Ashutosh Bajpai Tanmoy Chakraborty 13 0 0 21 Sep 2023
BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service Anna Kołos Inez Okulska Kinga Głąbińska Agnieszka Karlinska Emilia Wisnios Paweł Ellerik Andrzej Prałat 11 1 0 21 Aug 2023
Causality Guided Disentanglement for Cross-Platform Hate Speech Detection Paras Sheth Tharindu Kumarage Raha Moraffah Amanat Chadha Huan Liu 29 7 0 03 Aug 2023
Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts Shaina Raza Chen Ding D. Pandya FaML 16 2 0 14 Jul 2023
Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships David Jurgens Agrima Seth Jack E. Sargent Athena Aghighi Michael Geraci 22 7 0 06 Jul 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety Lora Aroyo Alex S. Taylor Mark Díaz Christopher Homan Alicia Parrish Greg Serapio-García Vinodkumar Prabhakaran Ding Wang 29 33 0 20 Jun 2023
PaLM 2 Technical Report Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin ... Ce Zheng Wei Zhou Denny Zhou Slav Petrov Yonghui Wu ReLM LRM 92 1,148 0 17 May 2023
Lightweight Toxicity Detection in Spoken Language: A Transformer-based Approach for Edge Devices Ahlam Husni Abu Nada S. Latif Junaid Qadir 20 4 0 22 Apr 2023
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network Sreyan Ghosh Manan Suri Purva Chiniya Utkarsh Tyagi Sonal Kumar Dinesh Manocha 27 12 0 02 Mar 2023
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models Rafal Kocielnik Shrimai Prabhumoye Vivian Zhang Roy Jiang R. Alvarez Anima Anandkumar 41 6 0 14 Feb 2023
Scaling Instruction-Finetuned Language Models Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay ... Jacob Devlin Adam Roberts Denny Zhou Quoc V. Le Jason W. Wei ReLM LRM 62 2,989 0 20 Oct 2022
A Review of Challenges in Machine Learning based Automated Hate Speech Detection Abhishek Velankar H. Patil Raviraj Joshi 32 8 0 12 Sep 2022
Representation Bias in Data: A Survey on Identification and Resolution Techniques N. Shahbazi Yin Lin Abolfazl Asudeh H. V. Jagadish 40 67 0 22 Mar 2022
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances Sreyan Ghosh Samden Lepcha S. Sakshi R. Shah S. Umesh 16 14 0 14 Oct 2021
Challenges in Detoxifying Language Models Johannes Welbl Amelia Glaese J. Uesato Sumanth Dathathri John F. J. Mellor Lisa Anne Hendricks Kirsty Anderson Pushmeet Kohli Ben Coppin Po-Sen Huang LM&MA 250 193 0 15 Sep 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech Mai Elsherief Caleb Ziems D. Muchlinski Vaishnavi Anupindi Jordyn Seybolt M. D. Choudhury Diyi Yang 103 236 0 11 Sep 2021
Towards generalisable hate speech detection: a review on obstacles and solutions Wenjie Yin A. Zubiaga 117 164 0 17 Feb 2021
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments Alexandra Chouldechova FaML 207 2,084 0 24 Oct 2016
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 245 31,257 0 16 Jan 2013