376

Revisiting maximum-a-posteriori estimation in log-concave models: from differential geometry to decision theory

SIAM Journal of Imaging Sciences (SIAM J. Imaging Sci.), 2016
Abstract

Maximum-a-posteriori (MAP) estimation is the main Bayesian estimation methodology in many areas of data science such as mathematical imaging and machine learning, where high dimensionality is addressed by using models that are log-concave and whose posterior mode can be computed efficiently by using convex optimisation algorithms. However, despite its success and rapid adoption, MAP estimation is not theoretically well understood yet, and the prevalent view is that it is generally not proper Bayesian estimation in a decision-theoretic sense. This paper presents a new decision-theoretic derivation of MAP estimation in Bayesian models that are log-concave. Our analysis is based on differential geometry and proceeds as follows. First, we exploit the log-concavity of the model to induce a Riemannian geometry on the parameter space. We then use differential geometry to identify the natural or canonical loss function to perform Bayesian point estimation in that Riemannian manifold. For log-concave models this canonical loss is the Bregman divergence of the negative log posterior density, a similarity measure rooted in convex analysis that in addition to the relative position of points also takes into account the geometry of the space, and which generalises the Euclidean squared distance to non-Euclidean settings. We then show that the MAP estimator is the Bayesian estimator that minimises the expected canonical loss, and that the posterior mean or MMSE estimator minimises the expected dual canonical loss. Finally, we establish universal performance and stability guarantees for MAP and MMSE estimation in high dimensional log-concave models. These results provide a new understanding of MAP and MMSE estimation under log-concavity, and reveal new insights about their good empirical performance and about the roles that log-concavity plays in high dimensional inference problems.

View on arXiv
Comments on this paper