10
0

Critical Appraisal of Fairness Metrics in Clinical Predictive AI

Main:32 Pages
2 Figures
2 Tables
Abstract

Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of "fairness" remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a "fairness metric" as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability. Future work should prioritise clinically meaningful metrics.

View on arXiv
@article{matos2025_2506.17035,
  title={ Critical Appraisal of Fairness Metrics in Clinical Predictive AI },
  author={ João Matos and Ben Van Calster and Leo Anthony Celi and Paula Dhiman and Judy Wawira Gichoya and Richard D. Riley and Chris Russell and Sara Khalid and Gary S. Collins },
  journal={arXiv preprint arXiv:2506.17035},
  year={ 2025 }
}
Comments on this paper