Maximum likelihood aggregation and misspecified generalized linear models

16 November 2009

Abstract

We study a natural extension of the pure aggregation problem to handle more general distributions for the response in a regression setup with random or deterministic design. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model is true. It is shown that this problem can be solved by constrained likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. A new proof technique is employed and yields error bounds that are accurate already for small sample sizes and provide guidelines to choose the geometry of the constraint. To illustrate the main results, we derive generalization error bounds for the LogitBoost algorithm in binary classification with a natural convex loss function.

View on arXiv

Comments on this paper