98
139

Complexity Theoretic Limitations on Learning Halfspaces

Abstract

We study the problem of agnostically learning halfspaces which is defined by a fixed but unknown distribution D\mathcal{D} on Qn×{±1}\mathbb{Q}^n\times \{\pm 1\}. We define ErrHALF(D)\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D}) as the least error of a halfspace classifier for D\mathcal{D}. A learner who can access D\mathcal{D} has to return a hypothesis whose error is small compared to ErrHALF(D)\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D}). Using the recently developed method of the author, Linial and Shalev-Shwartz we prove hardness of learning results under a natural assumption on the complexity of refuting random KK-XOR\mathrm{XOR} formulas. We show that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that ErrHALF(D)η\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D}) \le \eta for arbitrarily small constant η>0\eta>0, and that D\mathcal{D} is supported in {±1}n×{±1}\{\pm 1\}^n\times \{\pm 1\}. Namely, even under these favorable conditions its error must be 121nc\ge \frac{1}{2}-\frac{1}{n^c} for every c>0c>0. In particular, no efficient algorithm can achieve a constant approximation ratio. Under a stronger version of the assumption (where KK can be poly-logarithmic in nn), we can take η=2log1ν(n)\eta = 2^{-\log^{1-\nu}(n)} for arbitrarily small ν>0\nu>0. Interestingly, this is even stronger than the best known lower bounds (Arora et. al. 1993, Feldamn et. al. 2006, Guruswami and Raghavendra 2006) for the case that the learner is restricted to return a halfspace classifier (i.e. proper learning).

View on arXiv
Comments on this paper