ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.00086
  4. Cited By
Challenges in Automated Debiasing for Toxic Language Detection

Challenges in Automated Debiasing for Toxic Language Detection

29 January 2021
Xuhui Zhou
Maarten Sap
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
ArXivPDFHTML

Papers citing "Challenges in Automated Debiasing for Toxic Language Detection"

22 / 22 papers shown
Title
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun-Xiong Xia
Tianyi Wu
Zhiwei Xue
Y. Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
129
13
0
30 Jan 2025
Generative AI for Hate Speech Detection: Evaluation and Findings
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
25
11
0
16 Nov 2023
A Survey on Fairness in Large Language Models
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
37
59
0
20 Aug 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation
  Policies
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
35
3
0
23 Jul 2023
Having Beer after Prayer? Measuring Cultural Bias in Large Language
  Models
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous
Michael Joseph Ryan
Alan Ritter
Wei-ping Xu
24
84
0
23 May 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future
  Research Directions
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
16
20
0
14 Feb 2023
Towards Agile Text Classifiers for Everyone
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
24
13
0
13 Feb 2023
Rating Sentiment Analysis Systems for Bias through a Causal Lens
Rating Sentiment Analysis Systems for Bias through a Causal Lens
Kausik Lakkaraju
Biplav Srivastava
Marco Valtorta
18
7
0
04 Feb 2023
Multi-VALUE: A Framework for Cross-Dialectal English NLP
Multi-VALUE: A Framework for Cross-Dialectal English NLP
Caleb Ziems
William B. Held
Jingfeng Yang
Jwala Dhamala
Rahul Gupta
Diyi Yang
27
40
0
15 Dec 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
29
1
0
08 Nov 2022
The State of Profanity Obfuscation in Natural Language Processing
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
31
7
0
14 Oct 2022
Towards Debiasing Translation Artifacts
Towards Debiasing Translation Artifacts
Koel Dutta Chowdhury
Rricha Jalota
C. España-Bonet
Josef van Genabith
23
6
0
16 May 2022
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes
Antonis Maronikolakis
Philip Baader
Hinrich Schütze
17
9
0
13 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
24
6
0
11 Apr 2022
"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During
  the COVID-19 Pandemic
"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic
H. Nghiem
Fred Morstatter
17
8
0
04 Dec 2021
Annotators with Attitudes: How Annotator Beliefs And Identities Bias
  Toxic Language Detection
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
27
264
0
15 Nov 2021
Mitigating Racial Biases in Toxic Language Detection with an
  Equity-Based Ensemble Framework
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Matan Halevy
Camille Harris
A. Bruckman
Diyi Yang
A. Howard
34
35
0
27 Sep 2021
Detecting Inspiring Content on Social Media
Detecting Inspiring Content on Social Media
Oana Ignat
Y-Lan Boureau
Jane A. Yu
A. Halevy
22
6
0
06 Sep 2021
Just Say No: Analyzing the Stance of Neural Dialogue Generation in
  Offensive Contexts
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti
Maarten Sap
Alan Ritter
Mark O. Riedl
11
84
0
26 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and
  Tooling
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
27
105
0
07 Jul 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
6
121
0
13 Apr 2021
Empirical Analysis of Multi-Task Learning for Reducing Model Bias in
  Toxic Comment Detection
Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection
Ameya Vaidya
Feng Mai
Yue Ning
104
20
0
21 Sep 2019
1