A Boosting Algorithm for Positive-Unlabeled Learning
Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. A lot of PU methods based on linear models and neural networks have been proposed; however, there is still a lack of study on boosting algorithms for PU learning, while a traditional boosting algorithm with simple base learners may perform better than neural networks. We propose a novel boosting algorithm for PU learning: Ada-PU, which compares against neural networks. Ada-PU follows the general procedure of AdaBoost, while P data are regarded as positive and negative simultaneously. Three distributions of PU data are maintained and updated in Ada-PU instead of one in the ordinary supervised (PN) learning. After a weak classifier is learned on the newly updated distribution, the corresponding weight of the classifier for the final ensemble is estimated using only PU data. We demonstrated that the proposed method is guaranteed to keep three theoretical properties of boosting algorithms with a defined set of base classifiers. In experiments, we showed that Ada-PU outperforms neural networks on benchmark PU datasets. We also study a real-world dataset UNSW-NB15 in cyber security and demonstrated that Ada-PU has superior performance for malicious activity detection.
View on arXiv