Efficient Pool-Based Active Learning of Halfspaces

Journal of machine learning research (JMLR), 2012

17 August 2012

Abstract

We study pool-based active learning of halfspaces, in which a learner receives a pool of unlabeled examples, and iteratively queries a teacher for the labels of examples from the pool, in order to identify all the labels of pool examples. We revisit the idea of greedily selecting examples to label, and use it to derive an efficient algorithm, called ALuMA, that approximates the optimal label complexity for a given pool in $\reals^d$ . We show that ALuMA obtains an $O(d^2 \log(d))$ approximation factor if the examples in the pool are numbers with a finite accuracy. We further prove a result for general hypothesis classes, showing that a slight change to the greedy approach leads to an improved target-dependent guarantee on the label complexity. In particular, we conclude a better guarantee for ALuMA if the target hypothesis has a large margin. We further compare our approach to other common active learning strategies, and provide a theoretical and empirical evaluation of the advantages and disadvantages of the approach.

View on arXiv

Comments on this paper

All Papers

0 / 0 papers shown

Title