Like trainer, like bot? Inheritance of bias in algorithmic content moderation

5 July 2017

Reuben Binns

Papers citing "Like trainer, like bot? Inheritance of bias in algorithmic content moderation"

44 / 44 papers shown

Title
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations David Hartmann Amin Oueslati Dimitri Staufer Lena Pohlmann Simon Munzert Hendrik Heuer 121 2 0 03 Mar 2025
A Survey on Online User Aggression: Content Detection and Behavioral Analysis on Social Media Swapnil S. Mane Suman Kundu Rajesh Sharma 136 0 0 31 Dec 2024
Identity-related Speech Suppression in Generative AI Content Moderation Oghenefejiro Isaacs Anigboro Charlie M. Crawford Danaë Metaxa Sorelle A. Friedler Sorelle A. Friedler 145 0 0 09 Sep 2024
Rater Cohesion and Quality from a Vicarious Perspective Deepak Pandita Tharindu Cyril Weerasooriya Sujan Dutta Sarah K. K. Luger Tharindu Ranasinghe Ashiqur R. KhudaBukhsh Marcos Zampieri Christopher M. Homan 61 1 0 15 Aug 2024
From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions Trenton Chang Jenna Wiens 97 0 0 27 Jun 2024
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback Emilia Agis Lerner Florian E. Dorner Elliott Ash Naman Goel 64 1 0 09 Jun 2024
Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities Senjuti Dutta Sid Mittal Sherol Chen Deepak Ramachandran Ravi Rajakumar Ian D Kivlichan Sunny Mak Alena Butryna Praveen Paritosh University of Tennessee 111 7 0 01 Nov 2023
Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation A. Seza Doğruöz Sunayana Sitaram Zheng-Xin Yong 75 14 0 31 Oct 2023
Bystanders of Online Moderation: Examining the Effects of Witnessing Post-Removal Explanations Shagun Jhaver Himanshu Rathi Koustuv Saha 74 14 0 15 Sep 2023
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis Nayeon Lee Chani Jung Jun-Hee Myung Jiho Jin Jose Camacho-Collados Juho Kim Alice Oh 102 23 0 31 Aug 2023
Peer Surveillance in Online Communities Kyle S. Beadle Marie Vasek 54 0 0 02 Aug 2023
Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning Tharindu Cyril Weerasooriya Sarah K. K. Luger Saloni Poddar Ashiqur R. KhudaBukhsh Christopher Homan 100 5 0 07 Jul 2023
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics Matthias Orlikowski Paul Röttger Philipp Cimiano Italy 74 29 0 20 Jun 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety Christopher Homan Greg Serapio-García Lora Aroyo Mark Díaz Alicia Parrish Vinodkumar Prabhakaran Alex S. Taylor Ding Wang 86 9 0 20 Jun 2023
Safety and Fairness for Content Moderation in Generative Models Susan Hao Piyush Kumar Sarah Laszlo Shivani Poddar Bhaktipriya Radharapu Renee Shelby EGVM 85 21 0 09 Jun 2023
Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection Vyoma Raman Eve Fleisig Dan Klein 58 0 0 24 May 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research Luiza Amador Pozzobon Beyza Ermis Patrick Lewis Sara Hooker 93 48 0 24 Apr 2023
AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection Wenjie Yin Vibhor Agarwal Aiqi Jiang A. Zubiaga Nishanth R. Sastry 98 15 0 20 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context Ana Kotarcic Dominik Hangartner Fabrizio Gilardi Selina Kurer K. Donnay 60 3 0 05 Dec 2022
Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions Rafal Kocielnik Sara Kangaslahti Shrimai Prabhumoye M. Hari R. Alvarez Anima Anandkumar 54 8 0 21 Nov 2022
Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering Helena Bonaldi Sara Dellantonio Serra Sinem Tekiroğlu Marco Guerini 106 44 0 07 Nov 2022
Addressing interpersonal harm in online gaming communities: the opportunities and challenges for a restorative justice approach Sijia Xiao Shagun Jhaver Niloufar Salehi 16 31 0 02 Nov 2022
Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda Andrés Domínguez Hernández Richard Owen Dan Saattrup Nielsen Ryan McConville 76 8 0 05 Oct 2022
Measuring the Prevalence of Anti-Social Behavior in Online Communities J. Park Joseph Seering Michael S. Bernstein 54 22 0 27 Aug 2022
Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models Virginia K. Felkner Ho-Chun Herbert Chang Eugene Jang Jonathan May OSLM 59 8 0 23 Jun 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation Nitesh Goyal Ian D Kivlichan Rachel Rosen Lucy Vasserman 93 94 0 01 May 2022
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study Serra Sinem Tekiroğlu Helena Bonaldi Margherita Fanton Marco Guerini 107 48 0 04 Apr 2022
Suum Cuique: Studying Bias in Taboo Detection with a Community Perspective Osama Khalid Jonathan Rusert P. Srinivasan 25 1 0 22 Mar 2022
Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science S. Cambo Darren Gergle 72 36 0 08 Mar 2022
The Risks, Benefits, and Consequences of Prepublication Moderation: Evidence from 17 Wikipedia Language Editions Chau Tran Kaylea Champion Benjamin Mako Hill Rachel Greenstadt 23 2 0 11 Feb 2022
Dataset of Fake News Detection and Fact Verification: A Survey Taichi Murayama GNN 84 38 0 05 Nov 2021
Detecting Community Sensitive Norm Violations in Online Conversations Chan Young Park Julia Mendelsohn Karthik Radhakrishnan Kinjal Jain Tushar Kanakagiri David Jurgens Yulia Tsvetkov 92 24 0 09 Oct 2021
Mitigation of Diachronic Bias in Fake News Detection Dataset Taichi Murayama Shoko Wakamiya Eiji Aramaki AI4CE 104 13 0 28 Aug 2021
Towards Equal Gender Representation in the Annotations of Toxic Language Detection Elizabeth Excell Noura Al Moubayed 34 14 0 04 Jun 2021
3D4ALL: Toward an Inclusive Pipeline to Classify 3D Contents Nahyun Kwon Chen Liang Jeeeun Kim DiffM 36 1 0 24 Feb 2021
Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits Jack Bandy MLAU 82 118 0 03 Feb 2021
Cyberbullying Detection with Fairness Constraints O. Gencoglu 93 49 0 09 May 2020
Directions in Abusive Language Training Data: Garbage In, Garbage Out Bertie Vidgen Leon Derczynski 110 267 0 03 Apr 2020
The Risk to Population Health Equity Posed by Automated Decision Systems: A Narrative Review Mitchell Burger 27 6 0 18 Jan 2020
Designing Evaluations of Machine Learning Models for Subjective Inference: The Case of Sentence Toxicity Agathe Balayn A. Bozzon ELM 42 4 0 06 Nov 2019
Unfairness towards subjective opinions in Machine Learning Agathe Balayn A. Bozzon Zoltán Szlávik FaML 44 1 0 06 Nov 2019
Tackling Online Abuse: A Survey of Automated Abuse Detection Methods Pushkar Mishra H. Yannakoudakis Ekaterina Shutova 95 79 0 13 Aug 2019
Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts Samuel Carton Qiaozhu Mei Paul Resnick FAtt AAML 131 34 0 01 Sep 2018
Taking Turing by Surprise? Designing Digital Computers for morally-loaded contexts S. Delacroix 8 5 0 12 Mar 2018