408

Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Abstract

Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and accommodate operational constraints. We present a new method for creating data-driven scoring systems called Supersparse Linear Integer Models (SLIM). SLIM scoring systems are built by solving a discrete optimization problem that directly encodes measures of accuracy (the 0--1 loss) and sparsity (the 0\ell_0-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints that are difficult for other methods to accommodate. We provide bounds on the testing and training accuracy of SLIM scoring systems, as well as a new data reduction technique that can improve scalability by discarding a portion of the training data. We present results from an ongoing collaboration with the Massachusetts General Hospital Sleep Apnea Laboratory, where SLIM is being used to construct a highly tailored scoring system for sleep apnea screening.

View on arXiv
Comments on this paper