Phylogenetic Indian Buffet Process: Theory and Applications in Integrative Analysis of Cancer Genomics

By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an approach that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. Phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study both theoretical properties and practical usefulness of IBP and pIBP for binary factor models. For theoretical analysis, we established the posterior convergence rates for both IBP and pIBP and substantiated the theoretical results through simulation studies. As for application, we apply IBP and pIBP to data arising in the field of cancer genomics where we incorporate somatic mutations as prior information into gene expression data to study tumor heterogeneities. The results suggest that incorporating heterogeneity among subjects through pIBP may lead to better understanding of molecular mechanisms under tumor genesis and progression.
View on arXiv