42
0

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

Abstract

We study the problem of learning a single neuron with respect to the L22L_2^2-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a ``best-fit'' function. More precisely, given training samples from a reference distribution p0\mathcal{p}_0, the goal is to approximate the vector w\mathbf{w}^* which minimizes the squared loss with respect to the worst-case distribution that is close in χ2\chi^2-divergence to p0\mathcal{p}_{0}. We design a computationally efficient algorithm that recovers a vector w^ \hat{\mathbf{w}} satisfying Ep(σ(w^x)y)2CEp(σ(wx)y)2+ϵ\mathbb{E}_{\mathcal{p}^*} (\sigma(\hat{\mathbf{w}} \cdot \mathbf{x}) - y)^2 \leq C \, \mathbb{E}_{\mathcal{p}^*} (\sigma(\mathbf{w}^* \cdot \mathbf{x}) - y)^2 + \epsilon, where C>1C>1 is a dimension-independent constant and (w,p)(\mathbf{w}^*, \mathcal{p}^*) is the witness attaining the min-max risk minw : wWmaxpE(x,y)p(σ(wx)y)2νχ2(p,p0)\min_{\mathbf{w}~:~\|\mathbf{w}\| \leq W} \max_{\mathcal{p}} \mathbb{E}_{(\mathbf{x}, y) \sim \mathcal{p}} (\sigma(\mathbf{w} \cdot \mathbf{x}) - y)^2 - \nu \chi^2(\mathcal{p}, \mathcal{p}_0). Our algorithm follows a primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex L22L_2^2 loss. From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

View on arXiv
Comments on this paper