OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set
Unlabeled Data
Semi-supervised learning (SSL) is one of the most promising paradigms to circumvent the expensive labeling cost for building a high-performance model. Most existing SSL methods conventionally assume both labeled and unlabeled data are drawn from the same (class) distribution. However, unlabeled data may include out-of-class samples in practice; those that cannot have one-hot encoded labels from a closed-set of classes in label data, i.e. unlabeled data is an open-set. In this paper, we introduce OpenCoS, a method for handling this realistic semi-supervised learning scenario based upon a recent framework of self-supervised visual representation learning. Specifically, we first observe that the out-of-class samples in the open-set unlabeled dataset can be identified effectively via self-supervised contrastive learning. Then, OpenCoS utilizes this information to overcome the failure modes in the existing state-of-the-art semi-supervised methods, by utilizing one-hot pseudo-labels and soft-labels for the identified in- and out-of-class unlabeled data, respectively. Our extensive experimental results show the effectiveness of OpenCoS, fixing up the state-of-the-art semi-supervised methods to be suitable for diverse scenarios involving open-set unlabeled data.
View on arXiv