Robust estimation of a regression function in exponential families

We observe pairs of independent random variables and assume, although this might not be true, that for each , the conditional distribution of given belongs to a given exponential family with real parameter the value of which is an unknown function of the covariate . Given a model for , we propose an estimator with values in the construction of which is independent of the distribution of the . We show that possesses the properties of being robust to contamination, outliers and model misspecification. We establish non-asymptotic exponential inequalities for the upper deviations of a Hellinger-type distance between the true distribution of the data and the estimated one based on . We deduce a uniform risk bound for over the class of H\"olderian functions and we prove the optimality of this bound up to a logarithmic factor. Finally, we provide an algorithm for calculating when is assumed to belong to functional classes of low or medium dimensions (in a suitable sense) and, on a simulation study, we compare the performance of to that of the MLE and median-based estimators. The proof of our main result relies on an upper bound, with explicit numerical constants, on the expectation of the supremum of an empirical process over a VC-subgraph class. This bound can be of independent interest.
View on arXiv