Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.01477
Cited By
Like trainer, like bot? Inheritance of bias in algorithmic content moderation
5 July 2017
Reuben Binns
Michael Veale
Max Van Kleek
N. Shadbolt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Like trainer, like bot? Inheritance of bias in algorithmic content moderation"
44 / 44 papers shown
Title
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations
David Hartmann
Amin Oueslati
Dimitri Staufer
Lena Pohlmann
Simon Munzert
Hendrik Heuer
121
2
0
03 Mar 2025
A Survey on Online User Aggression: Content Detection and Behavioral Analysis on Social Media
Swapnil S. Mane
Suman Kundu
Rajesh Sharma
136
0
0
31 Dec 2024
Identity-related Speech Suppression in Generative AI Content Moderation
Oghenefejiro Isaacs Anigboro
Charlie M. Crawford
Danaë Metaxa
Sorelle A. Friedler
Sorelle A. Friedler
145
0
0
09 Sep 2024
Rater Cohesion and Quality from a Vicarious Perspective
Deepak Pandita
Tharindu Cyril Weerasooriya
Sujan Dutta
Sarah K. K. Luger
Tharindu Ranasinghe
Ashiqur R. KhudaBukhsh
Marcos Zampieri
Christopher M. Homan
61
1
0
15 Aug 2024
From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions
Trenton Chang
Jenna Wiens
97
0
0
27 Jun 2024
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
Emilia Agis Lerner
Florian E. Dorner
Elliott Ash
Naman Goel
64
1
0
09 Jun 2024
Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities
Senjuti Dutta
Sid Mittal
Sherol Chen
Deepak Ramachandran
Ravi Rajakumar
Ian D Kivlichan
Sunny Mak
Alena Butryna
Praveen Paritosh University of Tennessee
111
7
0
01 Nov 2023
Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation
A. Seza Doğruöz
Sunayana Sitaram
Zheng-Xin Yong
75
14
0
31 Oct 2023
Bystanders of Online Moderation: Examining the Effects of Witnessing Post-Removal Explanations
Shagun Jhaver
Himanshu Rathi
Koustuv Saha
74
14
0
15 Sep 2023
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis
Nayeon Lee
Chani Jung
Jun-Hee Myung
Jiho Jin
Jose Camacho-Collados
Juho Kim
Alice Oh
102
23
0
31 Aug 2023
Peer Surveillance in Online Communities
Kyle S. Beadle
Marie Vasek
54
0
0
02 Aug 2023
Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning
Tharindu Cyril Weerasooriya
Sarah K. K. Luger
Saloni Poddar
Ashiqur R. KhudaBukhsh
Christopher Homan
100
5
0
07 Jul 2023
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics
Matthias Orlikowski
Paul Röttger
Philipp Cimiano
Italy
74
29
0
20 Jun 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
86
9
0
20 Jun 2023
Safety and Fairness for Content Moderation in Generative Models
Susan Hao
Piyush Kumar
Sarah Laszlo
Shivani Poddar
Bhaktipriya Radharapu
Renee Shelby
EGVM
85
21
0
09 Jun 2023
Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection
Vyoma Raman
Eve Fleisig
Dan Klein
58
0
0
24 May 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
Luiza Amador Pozzobon
Beyza Ermis
Patrick Lewis
Sara Hooker
93
48
0
24 Apr 2023
AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection
Wenjie Yin
Vibhor Agarwal
Aiqi Jiang
A. Zubiaga
Nishanth R. Sastry
98
15
0
20 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Ana Kotarcic
Dominik Hangartner
Fabrizio Gilardi
Selina Kurer
K. Donnay
60
3
0
05 Dec 2022
Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Rafal Kocielnik
Sara Kangaslahti
Shrimai Prabhumoye
M. Hari
R. Alvarez
Anima Anandkumar
54
8
0
21 Nov 2022
Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering
Helena Bonaldi
Sara Dellantonio
Serra Sinem Tekiroğlu
Marco Guerini
106
44
0
07 Nov 2022
Addressing interpersonal harm in online gaming communities: the opportunities and challenges for a restorative justice approach
Sijia Xiao
Shagun Jhaver
Niloufar Salehi
16
31
0
02 Nov 2022
Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda
Andrés Domínguez Hernández
Richard Owen
Dan Saattrup Nielsen
Ryan McConville
76
8
0
05 Oct 2022
Measuring the Prevalence of Anti-Social Behavior in Online Communities
J. Park
Joseph Seering
Michael S. Bernstein
54
22
0
27 Aug 2022
Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models
Virginia K. Felkner
Ho-Chun Herbert Chang
Eugene Jang
Jonathan May
OSLM
59
8
0
23 Jun 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Nitesh Goyal
Ian D Kivlichan
Rachel Rosen
Lucy Vasserman
93
94
0
01 May 2022
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
Serra Sinem Tekiroğlu
Helena Bonaldi
Margherita Fanton
Marco Guerini
107
48
0
04 Apr 2022
Suum Cuique: Studying Bias in Taboo Detection with a Community Perspective
Osama Khalid
Jonathan Rusert
P. Srinivasan
25
1
0
22 Mar 2022
Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science
S. Cambo
Darren Gergle
72
36
0
08 Mar 2022
The Risks, Benefits, and Consequences of Prepublication Moderation: Evidence from 17 Wikipedia Language Editions
Chau Tran
Kaylea Champion
Benjamin Mako Hill
Rachel Greenstadt
23
2
0
11 Feb 2022
Dataset of Fake News Detection and Fact Verification: A Survey
Taichi Murayama
GNN
84
38
0
05 Nov 2021
Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park
Julia Mendelsohn
Karthik Radhakrishnan
Kinjal Jain
Tushar Kanakagiri
David Jurgens
Yulia Tsvetkov
92
24
0
09 Oct 2021
Mitigation of Diachronic Bias in Fake News Detection Dataset
Taichi Murayama
Shoko Wakamiya
Eiji Aramaki
AI4CE
104
13
0
28 Aug 2021
Towards Equal Gender Representation in the Annotations of Toxic Language Detection
Elizabeth Excell
Noura Al Moubayed
34
14
0
04 Jun 2021
3D4ALL: Toward an Inclusive Pipeline to Classify 3D Contents
Nahyun Kwon
Chen Liang
Jeeeun Kim
DiffM
36
1
0
24 Feb 2021
Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits
Jack Bandy
MLAU
82
118
0
03 Feb 2021
Cyberbullying Detection with Fairness Constraints
O. Gencoglu
93
49
0
09 May 2020
Directions in Abusive Language Training Data: Garbage In, Garbage Out
Bertie Vidgen
Leon Derczynski
110
267
0
03 Apr 2020
The Risk to Population Health Equity Posed by Automated Decision Systems: A Narrative Review
Mitchell Burger
27
6
0
18 Jan 2020
Designing Evaluations of Machine Learning Models for Subjective Inference: The Case of Sentence Toxicity
Agathe Balayn
A. Bozzon
ELM
42
4
0
06 Nov 2019
Unfairness towards subjective opinions in Machine Learning
Agathe Balayn
A. Bozzon
Zoltán Szlávik
FaML
44
1
0
06 Nov 2019
Tackling Online Abuse: A Survey of Automated Abuse Detection Methods
Pushkar Mishra
H. Yannakoudakis
Ekaterina Shutova
95
79
0
13 Aug 2019
Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts
Samuel Carton
Qiaozhu Mei
Paul Resnick
FAtt
AAML
131
34
0
01 Sep 2018
Taking Turing by Surprise? Designing Digital Computers for morally-loaded contexts
S. Delacroix
8
5
0
12 Mar 2018
1