Causality on Longitudinal Data: Stable Specification Search in Constrained Structural Equation Modeling

22 May 2016

R. Rahmadi

P. Groot

Marieke HC van Rijn

Jan AJG van den Brand

M. Heins

H. Knoop

Tom Heskes

the Alzheimer's Disease Neuroimaging Initiatives

the MASTERPLAN Study Group

the OPTIMISTIC Consortium

CML

ArXiv (abs)PDF HTML

Abstract

Developing causal models from observational longitudinal studies is an important, ubiquitous problem in many disciplines. In the medical domain, especially in the case of rare diseases, revealing causal relationships from a given data set may lead to improvement of clinical practice, e.g., development of treatment and medication. Many causal discovery methods have been introduced in the past decades. A disadvantage of these causal discovery algorithms, however, is the inherent instability in structure estimation. With finite data samples small changes in the data can lead to completely different optimal structures. The present work presents a new causal discovery algorithm for longitudinal data that is robust for finite data samples. The method works as follows. We model causal models using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting we use a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from optimal models) that are both stable and parsimonious which are then used to infer a causal model. In order to validate, we compare our method with the state-of-the-art PC algorithm on a simulated data set with the known ground truth model. Furthermore, we present the results of our discovery algorithm on three real-world longitudinal data sets about chronic fatigue syndrome, Alzheimer disease and chronic kidney disease that have been corroborated by medical experts and literature.

View on arXiv

Comments on this paper