The GUS Framework: Benchmarking Social Bias Classification with Discriminative (Encoder-Only) and Generative (Decoder-Only) Language Models

10 October 2024

Maximus Powers

Harshitha Reddy Jonala

Ansh Tiwari

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Main:22 Pages

6 Figures

Bibliography:7 Pages

15 Tables

Appendix:1 Pages

Abstract

The detection of social bias in text is a critical challenge, particularly due to the limitations of binary classification methods. These methods often oversimplify nuanced biases, leading to high emotional impact when content is misclassified as either "biased" or "fair." To address these shortcomings, we propose a more nuanced framework that focuses on three key linguistic components underlying social bias: Generalizations, Unfairness, and Stereotypes (the GUS framework). The GUS framework employs a semi-automated approach to create a comprehensive synthetic dataset, which is then verified by humans to maintain ethical standards. This dataset enables robust multi-label token classification. Our methodology, which combines discriminative (encoder-only) models and generative (auto-regressive large language models), identifies biased entities in text. Through extensive experiments, we demonstrate that encoder-only models are effective for this complex task, often outperforming state-of-the-art methods, both in terms of macro and entity-wise F1-score and Hamming loss. These findings can guide the choice of model for different use cases, highlighting the GUS framework's effectiveness in capturing explicit and implicit biases across diverse contexts, and offering a pathway for future research and applications in various fields.

View on arXiv

Comments on this paper