Robust estimation of a regression function in exponential families

We observe pairs of independent random variables and assume, although this might not be true, that for each , the conditional distribution of given belongs to a given exponential family with real parameter the value of which is a function of the covariate . Given a model for , we propose an estimator with values in the construction of which is independent of the distribution of the and that possesses the properties of being robust to contamination, outliers and model misspecification. We establish non-asymptotic exponential inequalities for the upper deviations of a Hellinger-type distance between the true distribution of the data and the estimated one based on . Under a suitable parametrization of the exponential family, we deduce a uniform risk bound for over the class of H\"olderian functions and we prove the optimality of this bound up to a logarithmic factor. Finally, we provide an algorithm for calculating when is assumed to belong to functional classes of low or medium dimensions (in a suitable sense) and, on a simulation study, we compare the performance of to that of the MLE and median-based estimators. The proof of our main result relies on an upper bound, with explicit numerical constants, on the expectation of the supremum of an empirical process over a VC-subgraph class. This bound can be of independent interest.
View on arXiv