327

Efficient active learning of sparse halfspaces with arbitrary bounded noise

Neural Information Processing Systems (NeurIPS), 2020
Abstract

In this work we study active learning of homogeneous ss-sparse halfspaces in Rd\mathbb{R}^d under label noise. Even in the absence of label noise this is a challenging problem and only recently have label complexity bounds of the form O~(spolylog(d,1ϵ))\tilde{O} \left(s \cdot \mathrm{polylog}(d, \frac{1}{\epsilon}) \right) been established in \citet{zhang2018efficient} for computationally efficient algorithms under the broad class of isotropic log-concave distributions. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse. When the label noise satisfies the {\em Massart} condition~\citep{massart2006risk}, i.e., each label is flipped with probability at most η\eta for a parameter η[0,12)\eta \in [0,\frac 1 2), the work of \citet{awasthi2016learning} provides a computationally efficient active learning algorithm under isotropic log-concave distributions with label complexity O~(spoly(1/(12η))poly(logd,1ϵ))\tilde{O} \left(s^{\mathrm{poly}{(1/(1-2\eta))}} \mathrm{poly}(\log d, \frac{1}{\epsilon}) \right). Hence the algorithm is label-efficient only when the noise rate η\eta is a constant. In this work, we substantially improve on the state of the art by designing a polynomial time algorithm for active learning of ss-sparse halfspaces under bounded noise and isotropic log-concave distributions, with a label complexity of O~(s(12η)4polylog(d,1ϵ))\tilde{O} \left(\frac{s}{(1-2\eta)^4} \mathrm{polylog} (d, \frac 1 \epsilon) \right). Hence, our new algorithm is label-efficient even for noise rates close to 12\frac{1}{2}. Prior to our work, such a result was not known even for the random classification noise model. Our algorithm builds upon existing margin-based algorithmic framework and at each iteration performs a sequence of online mirror descent updates on a carefully chosen loss sequence, and uses a novel gradient update rule that accounts for the bounded noise.

View on arXiv
Comments on this paper