Contexts Matter: An Empirical Study on Contextual Influence in Fairness Testing for Deep Learning Systems

International Symposium on Empirical Software Engineering and Measurement (ESEM), 2024

12 August 2024

Chengwen Du

Tao Chen

ArXiv (abs)PDF HTML

Main:10 Pages

8 Figures

Bibliography:1 Pages

6 Tables

Appendix:1 Pages

Abstract

Background: Fairness testing for deep learning systems has been becoming increasingly important. However, much work assumes perfect context and conditions from the other parts: well-tuned hyperparameters for accuracy; rectified bias in data, and mitigated bias in the labeling. Yet, these are often difficult to achieve in practice due to their resource-/labour-intensive nature. Aims: In this paper, we aim to understand how varying contexts affect fairness testing outcomes. Method:We conduct an extensive empirical study, which covers $10,800$ cases, to investigate how contexts can change the fairness testing result at the model level against the existing assumptions. We also study why the outcomes were observed from the lens of correlation/fitness landscape analysis. Results: Our results show that different context types and settings generally lead to a significant impact on the testing, which is mainly caused by the shifts of the fitness landscape under varying contexts. Conclusions: Our findings provide key insights for practitioners to evaluate the test generators and hint at future research directions.

View on arXiv

Comments on this paper