HateBERT: Retraining BERT for Abusive Language Detection in English

23 October 2020

Papers citing "HateBERT: Retraining BERT for Abusive Language Detection in English"

40 / 40 papers shown

Title
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs Sai Krishna Mendu Harish Yenala Aditi Gulati Shanu Kumar Parag Agrawal 29 0 0 04 May 2025
Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering Hao Zhuo Yicheng Yang Kewen Peng 25 0 0 21 Apr 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses Rohitash Chandra Aryan Chaudhary Yeshwanth Rayavarapu 44 0 0 27 Mar 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation Shiza Ali Jeremy Blackburn Gianluca Stringhini 59 0 0 24 Feb 2025
Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement Junyu Lu Kai Ma Kaichun Wang Kelaiti Xiao Roy Ka-Wei Lee Bo Xu Liang Yang Hongfei Lin 46 0 0 10 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet Berk Atil Vipul Gupta Sarkar Snigdha Sarathi Das R. Passonneau 167 0 0 07 Feb 2025
Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs Rohitash Chandra Guoxiang Ren G. Houseman 46 0 0 20 Jan 2025
Towards Efficient and Explainable Hate Speech Detection via Model Distillation Paloma Piot Javier Parapar 83 173 0 18 Dec 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Seanie Lee Haebin Seong Dong Bok Lee Minki Kang Xiaoyin Chen Dominik Wagner Yoshua Bengio Juho Lee Sung Ju Hwang 67 2 0 02 Oct 2024
Towards Generalized Offensive Language Identification A. Dmonte Tejas Arya Tharindu Ranasinghe Marcos Zampieri 44 3 0 26 Jul 2024
ToVo: Toxicity Taxonomy via Voting Tinh Son Luong Thanh-Thien Le Thang Viet Doan L. Van Thien Huu Nguyen Diep Thi-Ngoc Nguyen 31 0 0 21 Jun 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations Preetam Prabhu Srikar Dammu Hayoung Jung Anjali Singh Monojit Choudhury Tanushree Mitra 32 8 0 08 May 2024
Target Span Detection for Implicit Harmful Content Nazanin Jafari James Allan Sheikh Muhammad Sarwar 38 1 0 28 Mar 2024
Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales Ayushi Nirmal Amrita Bhattacharjee Paras Sheth Huan Liu AAML 35 11 0 19 Mar 2024
InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks Somnath Banerjee Maulindu Sarkar Punyajoy Saha Binny Mathew Animesh Mukherjee TDI 34 0 0 22 Feb 2024
Efficient Models for the Detection of Hate, Abuse and Profanity Christoph Tillmann Aashka Trivedi Bishwaranjan Bhattacharjee VLM 11 0 0 08 Feb 2024
Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda Richard Kimera Daniela N. Rim Joseph Kirabira Ubong Godwin Udomah Heeyoul Choi AI4MH 25 1 0 25 Jan 2024
Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models Jiang Zhang Qiong Wu Yiming Xu Cheng Cao Zheng Du Konstantinos Psounis 30 14 0 13 Dec 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study Maike Zufle Verna Dankers Ivan Titov 34 0 0 16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings Sagi Pendzel Tomer Wullach Amir Adler Einat Minkov 25 11 0 16 Nov 2023
Pre-training LLMs using human-like development data corpus Khushi Bhardwaj Raj Sanjay Shah Sashank Varma 22 6 0 08 Nov 2023
LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification K. Chernyshev E. Garanina Duygu Bayram Qiankun Zheng Lukas Edman 11 0 0 08 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation Rahul Madhavan Rishabh Garg Kahini Wadhawan S. Mehta 21 5 0 01 Jun 2023
Detecting Multidimensional Political Incivility on Social Media Sagi Pendzel Nir Lotan Alon Zoizner Einat Minkov 16 1 0 24 May 2023
Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback Shang-ling Hsu Raj Sanjay Shah Prathik Senthil Zahra Ashktorab Casey Dugan Werner Geyer Diyi Yang 44 20 0 15 May 2023
NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset Sana Al-Azzawi Gyorgy Kovács Filip Nilsson Tosin P. Adewumi Marcus Liwicki 20 6 0 25 Apr 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism Hannah Rose Kirk Wenjie Yin Bertie Vidgen Paul Röttger 10 117 0 07 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer Shanu Kumar Abbaraju Soujanya Sandipan Dandapat Sunayana Sitaram Monojit Choudhury VLM 25 1 0 04 Mar 2023
Towards Agile Text Classifiers for Everyone Maximilian Mozes Jessica Hoffmann Katrin Tomanek Muhamed Kouate Nithum Thain Ann Yuan Tolga Bolukbasi Lucas Dixon 34 13 0 13 Feb 2023
A benchmark for toxic comment classification on Civil Comments dataset Corentin Duchene Henri Jamet Pierre Guillaume Reda Dehak 18 8 0 26 Jan 2023
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation Tsu-jui Fu Licheng Yu Ning Zhang Cheng-Yang Fu Jong-Chyi Su William Yang Wang Sean Bell VGen 56 37 0 23 Nov 2022
Dictionary-Assisted Supervised Contrastive Learning Patrick Y. Wu Richard Bonneau Joshua A. Tucker Jonathan Nagler CLIP 25 0 0 27 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing Debora Nozza Dirk Hovy 34 7 0 14 Oct 2022
T5 for Hate Speech, Augmented Data and Ensemble Tosin P. Adewumi Sana Sabah Sabry Nosheen Abid F. Liwicki Marcus Liwicki 6 10 0 11 Oct 2022
Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale Aditya Kane Shantanu Patankar Tanmay Chavan Raviraj Joshi VLM 27 7 0 09 Oct 2022
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice Mohit Singhal Chen Ling Pujan Paudel Poojitha Thota Nihal Kumarswamy Gianluca Stringhini Shirin Nilizadeh 75 28 0 29 Jun 2022
Detecting Harmful Online Conversational Content towards LGBTQIA+ Individuals Jamell Dacon Harry Shomer Shaylynn Crum-Dacon Jiliang Tang 17 8 0 15 Jun 2022
bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments Vitthal Bhandari Poonam Goyal 23 16 0 27 Mar 2022
Large-Scale Hate Speech Detection with Cross-Domain Transfer Cagri Toraman Furkan Şahinuç E. Yilmaz 24 58 0 02 Mar 2022
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases Shrimai Prabhumoye Rafal Kocielnik M. Shoeybi Anima Anandkumar Bryan Catanzaro 27 20 0 15 Dec 2021