v1v2v3 (latest)

Efficient and minimal length parametric conformal prediction regions

9 May 2019

Abstract

Conformal prediction methods construct prediction regions for iid data that are valid in finite samples. We provide two parametric conformal prediction regions that are applicable for a wide class of continuous statistical models. This class of statistical models includes generalized linear models (GLMs) with continuous outcomes. Our parametric conformal prediction regions possesses finite sample validity, even when the model is misspecified, and are asymptotically of minimal length when the model is correctly specified. The first parametric conformal prediction region is constructed through binning of the predictor space, guarantees finite-sample local validity and is asymptotically minimal at the $\sqrt{\log(n)/n}$ rate when the dimension $d$ of the predictor space is one or two, and converges at the $O\{(\log(n)/n)^{1/d}\}$ rate when $d > 2$ . The second parametric conformal prediction region is constructed by transforming the outcome variable to a common distribution via the probability integral transform, guarantees finite-sample marginal validity, and is asymptotically minimal at the $\sqrt{\log(n)/n}$ rate. We develop a novel concentration inequality for maximum likelihood estimation that induces these convergence rates. We analyze prediction region coverage properties, large-sample efficiency, and robustness properties of four methods for constructing conformal prediction intervals for GLMs: fully nonparametric kernel-based conformal, residual based conformal, normalized residual based conformal, and parametric conformal which uses the assumed GLM density as a conformity measure. Extensive simulations compare these approaches to standard asymptotic prediction regions. The utility of the parametric conformal prediction region is demonstrated in an application to interval prediction of glycosylated hemoglobin levels, a blood measurement used to diagnose diabetes.

View on arXiv

Comments on this paper