Consensus of state of the art mortality prediction models: From all-cause mortality to sudden death prediction

Worldwide, many millions of people die suddenly and unexpectedly each year, either with or without a prior history of cardiovascular disease. Such events are sparse (once in a lifetime), many victims will not have had prior investigations for cardiac disease and many different definitions of sudden death exist. Accordingly, sudden death is hard to predict. This analysis used NHS Electronic Health Records (EHRs) for people aged 50 years living in the Greater Glasgow and Clyde (GG\&C) region in 2010 (n = 380,000) to try to overcome these challenges. We investigated whether medical history, blood tests, prescription of medicines, and hospitalisations might, in combination, predict a heightened risk of sudden death. We compared the performance of models trained to predict either sudden death or all-cause mortality. We built six models for each outcome of interest: three taken from state-of-the-art research (BEHRT, Deepr and Deep Patient), and three of our own creation. We trained these using two different data representations: a language-based representation, and a sparse temporal matrix. We used global interpretability to understand the most important features of each model, and compare how much agreement there was amongst models using Rank Biased Overlap. It is challenging to account for correlated variables without increasing the complexity of the interpretability technique. We overcame this by clustering features into groups and comparing the most important groups for each model. We found the agreement between models to be much higher when accounting for correlated variables. Our analysis emphasises the challenge of predicting sudden death and emphasises the need for better understanding and interpretation of machine learning models applied to healthcare applications.
View on arXiv