30
0
v1v2v3 (latest)

Robust estimation of a regression function in exponential families

Abstract

We observe nn pairs of independent (but not necessarily i.i.d.) random variables X1=(W1,Y1),,Xn=(Wn,Yn)X_{1}=(W_{1},Y_{1}),\ldots,X_{n}=(W_{n},Y_{n}) and tackle the problem of estimating the conditional distributions Qi(wi)Q_{i}^{\star}(w_{i}) of YiY_{i} given Wi=wiW_{i}=w_{i} for all i{1,,n}i\in\{1,\ldots,n\}. Even though these might not be true, we base our estimator on the assumptions that the data are i.i.d.\ and the conditional distributions of YiY_{i} given Wi=wiW_{i}=w_{i} belong to a one parameter exponential family Qˉ\bar{\mathscr{Q}} with parameter space given by an interval II. More precisely, we pretend that these conditional distributions take the form Qθ(wi)QˉQ_{{\boldsymbol{\theta}}(w_{i})}\in \bar{\mathscr{Q}} for some θ{\boldsymbol{\theta}} that belongs to a VC-class Θˉ\bar{\boldsymbol{\Theta}} of functions with values in II. For each i{1,,n}i\in\{1,\ldots,n\}, we estimate Qi(wi)Q_{i}^{\star}(w_{i}) by a distribution of the same form, i.e.\ Qθ^(wi)QˉQ_{\hat{\boldsymbol{\theta}}(w_{i})}\in \bar{\mathscr{Q}}, where θ^=θ^(X1,,Xn)\hat {\boldsymbol{\theta}}=\hat {\boldsymbol{\theta}}(X_{1},\ldots,X_{n}) is a well-chosen estimator with values in Θˉ\bar{\boldsymbol{\Theta}}. We show that our estimation strategy is robust to model misspecification, contamination and the presence of outliers. Besides, we provide an algorithm for calculating θ^\hat{\boldsymbol{\theta}} when Θˉ\bar{\boldsymbol{\Theta}} is a VC-class of functions of low or moderate dimension and we carry out a simulation study to compare the performance of θ^\hat{\boldsymbol{\theta}} to that of the MLE and median-based estimators.

View on arXiv
Comments on this paper