-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets

We study the -generalized probit regression model, which is a generalized linear model for binary responses. It extends the standard probit model by replacing its link function, the standard normal cdf, by a -generalized normal distribution for . The -generalized normal distributions \citep{Sub23} are of special interest in statistical modeling because they fit much more flexibly to data. Their tail behavior can be controlled by choice of the parameter , which influences the model's sensitivity to outliers. Special cases include the Laplace, the Gaussian, and the uniform distributions. We further show how the maximum likelihood estimator for -generalized probit regression can be approximated efficiently up to a factor of on large data by combining sketching techniques with importance subsampling to obtain a small data summary called coreset.
View on arXiv