338

Relabeling Minimal Training Subset to Flip a Prediction

Findings (Findings), 2023
Abstract

When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: To flip the prediction on a test point xtx_t, how to identify the smallest training subset St\mathcal{S}_t we need to relabel? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 2% of the training points can always flip a prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by altering training points; (2) evaluating model robustness with the cardinality of the subset (i.e., St|\mathcal{S}_t|); we show that St|\mathcal{S}_t| is highly related to the noise ratio in the training set and St|\mathcal{S}_t| is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.

View on arXiv
Comments on this paper