v1v2 (latest)

Interpreting artificial neural networks to detect genome-wide association signals for complex traits

26 July 2024

Burak Yelmen

Maris Alver

Estonian Biobank Research Team

Flora Jay

L. Milani

Lili Milani

ArXiv (abs)PDF HTML

Abstract

Investigating the genetic architecture of complex diseases is challenging due to the multifactorial and interactive landscape of genomic and environmental influences. Although genome-wide association studies (GWAS) have identified thousands of variants for multiple complex traits, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis in models. In this work, we trained artificial neural networks to predict complex traits using both simulated and real genotype-phenotype datasets. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated loci (PAL) for the target phenotype and devised an approach for obtaining p-values for the detected PAL. Simulations with various parameters demonstrated that associated loci can be detected with good precision using strict selection criteria. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we detected multiple loci associated with this highly polygenic and heritable disorder. There was significant concordance between PAL and loci previously associated with schizophrenia and bipolar disorder, with enrichment analyses of genes within the identified PAL predominantly highlighting terms related to brain morphology and function. With advancements in model optimization and uncertainty quantification, artificial neural networks have the potential to enhance the identification of genomic loci associated with complex diseases, offering a more comprehensive approach for GWAS and serving as initial screening tools for subsequent functional studies.

View on arXiv

Comments on this paper