Isolating effects of age with fair representation learning when
assessing dementia
One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers to isolate. In this paper, we show that deep neural network (DNN) classifiers can infer ages from linguistic features, which is an entanglement that could lead to unfairness across this covariate. We explain this problem with a v-structure in causality diagrams, and address it with fair representation learning. We build neural network classifiers that learn low-dimensional representations reflecting the impacts of dementia but do not contain age-related information. To evaluate these classifiers, we specify a model-agnostic score measuring how classifier results are disentangled from age. Our best models outperform baseline neural network classifiers in disentanglement, while compromising accuracy by as little as 2.56\% and 2.25\% on DementiaBank and the Famous People dataset respectively.
View on arXiv