In this paper, we show than an embedding in Euclidean space based on tropical geometry generates stable sufficient statistics for barcodes --- multiscale summaries of topological characteristics that capture the "shape" of data, but have complex structures and are therefore difficult to use in statistical settings. Our sufficiency result allows for the assumption of classical probability distributions on Euclidean representations of barcodes. This in turn makes a variety of parametric statistical inference methods amenable to barcodes, all while maintaining their initial interpretations. In particular, we show that exponential family distributions may be assumed, and that likelihoods for persistent homology may be constructed. We conceptually demonstrate sufficiency and illustrate its utility in persistent homology dimensions 0 and 1 with concrete parametric applications to HIV and avian influenza data.
View on arXiv