Estimating Model Performance Under Covariate Shift Without Labels

16 January 2024

Jakub Bialek

W. Kuberski

Nikolaos Perrakis

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (271★)

Main:10 Pages

7 Figures

Bibliography:3 Pages

2 Tables

Appendix:12 Pages

Abstract

Machine learning models often experience performance degradation post-deployment due to shifts in data distribution. It is challenging to assess post-deployment performance accurately when labels are missing or delayed. Existing proxy methods, such as drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method for evaluating classification models on unlabeled data that accurately quantifies the impact of covariate shift on model performance and call it Probabilistic Adaptive Performance Estimation (PAPE). It is model and data-type agnostic and works for any performance metric. Crucially, PAPE operates independently of the original model, relying only on its predictions and probability estimates, and does not need any assumptions about the nature of the shift, learning directly from data instead. We tested PAPE using over 900 dataset-model combinations from US census data, assessing its performance against several benchmarks through various metrics. Our findings show that PAPE outperforms other methodologies, making it a superior choice for estimating the performance of classification models.

View on arXiv

Comments on this paper