21
0

Iterative thresholding for non-linear learning in the strong ε\varepsilon-contamination model

Abstract

We derive approximation bounds for learning single neuron models using thresholded gradient descent when both the labels and the covariates are possibly corrupted adversarially. We assume the data follows the model y=σ(wx)+ξ,y = \sigma(\mathbf{w}^{*} \cdot \mathbf{x}) + \xi, where σ\sigma is a nonlinear activation function, the noise ξ\xi is Gaussian, and the covariate vector x\mathbf{x} is sampled from a sub-Gaussian distribution. We study sigmoidal, leaky-ReLU, and ReLU activation functions and derive a O(νϵlog(1/ϵ))O(\nu\sqrt{\epsilon\log(1/\epsilon)}) approximation bound in 2\ell_{2}-norm, with sample complexity O(d/ϵ)O(d/\epsilon) and failure probability eΩ(d)e^{-\Omega(d)}. We also study the linear regression problem, where σ(x)=x\sigma(\mathbf{x}) = \mathbf{x}. We derive a O(νϵlog(1/ϵ))O(\nu\epsilon\log(1/\epsilon)) approximation bound, improving upon the previous O(ν)O(\nu) approximation bounds for the gradient-descent based iterative thresholding algorithms of Bhatia et al. (NeurIPS 2015) and Shen and Sanghavi (ICML 2019). Our algorithm has a O(polylog(N,d)log(R/ϵ))O(\textrm{polylog}(N,d)\log(R/\epsilon)) runtime complexity when w2R\|\mathbf{w}^{*}\|_2 \leq R, improving upon the O(polylog(N,d)/ϵ2)O(\text{polylog}(N,d)/\epsilon^2) runtime complexity of Awasthi et al. (NeurIPS 2022).

View on arXiv
Comments on this paper