Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15761
Cited By
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
31 December 2020
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection"
44 / 44 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
106
0
10 Apr 2025
CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models
Xiangyu Yin
Jiaxu Liu
Zhen Chen
Jinwei Hu
Yi Dong
Xiaowei Huang
Wenjie Ruan
AAML
45
0
0
08 Mar 2025
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
44
1
0
21 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
65
2
0
02 Oct 2024
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Kunsheng Tang
Wenbo Zhou
Jie Zhang
Aishan Liu
Gelei Deng
Shuai Li
Peigui Qi
Weiming Zhang
Tianwei Zhang
Nenghai Yu
37
3
0
22 Aug 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
55
12
0
28 May 2024
Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph
Runsong Jia
Bowen Zhang
Sergio J. Rodríguez Méndez
Pouya Ghiasnezhad Omran
RALM
32
5
0
24 May 2024
Quite Good, but Not Enough: Nationality Bias in Large Language Models -- A Case Study of ChatGPT
Shucheng Zhu
Weikang Wang
Ying Liu
29
5
0
11 May 2024
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models
Tanmay Sen
Ansuman Das
Mrinmay Sen
36
4
0
26 Apr 2024
Target Span Detection for Implicit Harmful Content
Nazanin Jafari
James Allan
Sheikh Muhammad Sarwar
32
1
0
28 Mar 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
B. Ermiş
36
7
0
06 Mar 2024
Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language
Hezhao Zhang
Lasana Harris
N. Moosavi
AILaw
41
1
0
21 Feb 2024
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Aiqi Jiang
A. Zubiaga
AAML
26
3
0
17 Jan 2024
An Investigation of Large Language Models for Real-World Hate Speech Detection
Keyan Guo
Alexander Hu
Jaden Mu
Ziheng Shi
Ziming Zhao
Nishant Vishwamitra
Hongxin Hu
20
12
0
07 Jan 2024
Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
Jiang Zhang
Qiong Wu
Yiming Xu
Cheng Cao
Zheng Du
Konstantinos Psounis
28
14
0
13 Dec 2023
Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts
Xiruo Ding
Zhecheng Sheng
Brian Hur
Feng Chen
Serguei V. S. Pakhomov
Trevor Cohen
OOD
8
0
0
09 Dec 2023
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
38
10
0
16 Nov 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
19
1
0
10 Oct 2023
Examining Temporal Bias in Abusive Language Detection
Mali Jin
Yida Mu
Diana Maynard
Kalina Bontcheva
26
5
0
25 Sep 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
21
122
0
02 Aug 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
35
3
0
23 Jul 2023
CL-UZH at SemEval-2023 Task 10: Sexism Detection through Incremental Fine-Tuning and Multi-Task Learning with Label Descriptions
Janis Goldzycher
11
1
0
06 Jun 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
11
401
0
13 Apr 2023
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
20
42
0
31 Mar 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
10
117
0
07 Mar 2023
A Federated Approach for Hate Speech Detection
Jay Gala
Deep Gandhi
Jash Mehta
Zeerak Talat
13
4
0
18 Feb 2023
Cross-Reality Re-Rendering: Manipulating between Digital and Physical Realities
Siddhartha Datta
25
0
0
15 Nov 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
35
1
0
08 Nov 2022
BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets
Minju Kim
Chaehyeong Kim
Yongho Song
Seung-won Hwang
Jinyoung Yeo
31
13
0
23 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
23
10
0
20 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
34
7
0
14 Oct 2022
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
11
0
0
18 Sep 2022
Increasing Adverse Drug Events extraction robustness on social media: case study on negation and speculation
Simone Scaboro
Beatrice Portelli
Emmanuele Chersoni
Enrico Santus
G. Serra
22
5
0
06 Sep 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
22
65
0
20 Jun 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
19
16
0
09 May 2022
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
Serra Sinem Tekiroğlu
Helena Bonaldi
Margherita Fanton
Marco Guerini
16
43
0
04 Apr 2022
Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
40
5
0
23 Mar 2022
Reducing Target Group Bias in Hate Speech Detectors
Darsh J. Shah
Sinong Wang
Han Fang
Hao Ma
Luke Zettlemoyer
FaML
15
2
0
07 Dec 2021
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
29
266
0
15 Nov 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
367
0
15 Oct 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
32
105
0
07 Jul 2021
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
29
259
0
31 Dec 2020
A Framework for the Computational Linguistic Analysis of Dehumanization
Julia Mendelsohn
Yulia Tsvetkov
Dan Jurafsky
82
89
0
06 Mar 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
237
319
0
21 Aug 2019
1