BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models

14 February 2023

Rafal Kocielnik

Abstract

Pretrained Language Models (PLMs) harbor inherent social biases that can result in harmful real-world implications. Such social biases are measured through the probability values that PLMs output for different social groups and attributes appearing in a set of test sentences. However, bias testing is currently cumbersome since the test sentences are generated either from a limited set of manual templates or need expensive crowd-sourcing. We instead propose using ChatGPT for controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes appearing in the test sentences. When compared to template-based methods, our approach using ChatGPT for test sentence generation is superior in detecting social bias, especially in challenging settings such as intersectional biases. We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing. We provide a large diverse dataset of test sentences generated by ChatGPT that satisfies the specified social group and attribute requirements and matches the quality of human-generated sentences. We thus enable seamless open-ended social bias testing of PLMs through an automatic large-scale generation of diverse test sentences for any combination of social categories and attributes.

View on arXiv

Comments on this paper