Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.00086
Cited By
Challenges in Automated Debiasing for Toxic Language Detection
29 January 2021
Xuhui Zhou
Maarten Sap
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Challenges in Automated Debiasing for Toxic Language Detection"
22 / 22 papers shown
Title
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun-Xiong Xia
Tianyi Wu
Zhiwei Xue
Y. Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
129
13
0
30 Jan 2025
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
25
11
0
16 Nov 2023
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
37
59
0
20 Aug 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
35
3
0
23 Jul 2023
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous
Michael Joseph Ryan
Alan Ritter
Wei-ping Xu
24
84
0
23 May 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
16
20
0
14 Feb 2023
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
24
13
0
13 Feb 2023
Rating Sentiment Analysis Systems for Bias through a Causal Lens
Kausik Lakkaraju
Biplav Srivastava
Marco Valtorta
18
7
0
04 Feb 2023
Multi-VALUE: A Framework for Cross-Dialectal English NLP
Caleb Ziems
William B. Held
Jingfeng Yang
Jwala Dhamala
Rahul Gupta
Diyi Yang
27
40
0
15 Dec 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
29
1
0
08 Nov 2022
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
31
7
0
14 Oct 2022
Towards Debiasing Translation Artifacts
Koel Dutta Chowdhury
Rricha Jalota
C. España-Bonet
Josef van Genabith
23
6
0
16 May 2022
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes
Antonis Maronikolakis
Philip Baader
Hinrich Schütze
17
9
0
13 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
24
6
0
11 Apr 2022
"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic
H. Nghiem
Fred Morstatter
17
8
0
04 Dec 2021
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
27
264
0
15 Nov 2021
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Matan Halevy
Camille Harris
A. Bruckman
Diyi Yang
A. Howard
34
35
0
27 Sep 2021
Detecting Inspiring Content on Social Media
Oana Ignat
Y-Lan Boureau
Jane A. Yu
A. Halevy
22
6
0
06 Sep 2021
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti
Maarten Sap
Alan Ritter
Mark O. Riedl
11
84
0
26 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
27
105
0
07 Jul 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
6
121
0
13 Apr 2021
Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection
Ameya Vaidya
Feng Mai
Yue Ning
104
20
0
21 Sep 2019
1