Self-training Converts Weak Learners to Strong Learners in Mixture Models

We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant such that if a pseudolabeler can achieve classification error at most , then for any , an iterative self-training algorithm initialized at using pseudolabels and using at most unlabeled examples suffices to learn the Bayes-optimal classifier up to error, where is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler with classification error using only labeled examples (i.e., independent of ). Together our results imply that mixture models can be learned to within of the Bayes-optimal accuracy using at most labeled examples and unlabeled examples by way of a semi-supervised self-training algorithm.
View on arXiv