73
20

Bayesian nonparametric dependent model for the study of diversity for species data

Abstract

We introduce a dependent Bayesian nonparametric model for the probabilistic modelling of species-by-site data, ie population data where observations at different sites are classified in distinct species. These data can be represented as a frequency matrix giving the number of times each species is observed in each site. Our aim is to study the impact of additional factors (covariates), for instance environmental factors, on the data structure, and in particular on the diversity. To that purpose, we introduce dependence a priori across the covariates, and show that it improves posterior inference. We use a dependent version of the Griffiths-Engen-McCloskey distribution, the distribution of the weights of the Dirichlet process, in the same lines as the Dependent Dirichlet process is defined. The prior is thus defined via the stick-breaking construction, where the weights are obtained by transforming a Gaussian process, and the dependence stems from the covariance function of the latter. Some distributional properties of the model are explored. A Markov chain Monte Carlo algorithm for posterior sampling is described, along with the sampling scheme of the predictive distribution for unobserved factors. Both samplers are illustrated on simulated data and on a real data set obtained in experiments conducted in Antarctica soil.

View on arXiv
Comments on this paper