Generalized linear models (GLMs) extend linear regression by generating the dependent variables through a nonlinear function of a predictor in a Reproducing Kernel Hilbert Space. Despite nonconvexity of the underlying optimization problem, the GLM-tron algorithm of Kakade et al. (2011) provably learns GLMs with guarantees of computational and statistical efficiency. We present an extension of the GLM-tron to a mirror descent or natural gradient-like setting, which we call the Reflectron. The Reflectron enjoys the same statistical guarantees as the GLM-tron for any choice of potential function . We show that can be used to exploit the underlying optimization geometry and improve statistical guarantees, or to define an optimization geometry and thereby implicitly regularize the model. The implicit bias of the algorithm can be used to impose advantageous -- such as sparsity-promoting -- priors on the learned weights. Our results extend to the case of multiple outputs with or without weight sharing, and we further show that the Reflectron can be used for online learning of GLMs in the realizable or bounded noise settings. We primarily perform our analysis in continuous-time, leading to simple derivations. We subsequently prove matching guarantees for a discrete implementation. We supplement our theoretical analysis with simulations on real and synthetic datasets demonstrating the validity of our theoretical results.
View on arXiv