Most recent neural semi-supervised learning algorithms rely on adding small perturbation to either the input vectors or their representations. These methods have been successful on computer vision tasks as the images form a continuous manifold, but are not appropriate for discrete input such as sentence. To adapt these methods to text input, we propose to decompose a neural network into two components and so that . The layers in are then frozen and only the layers in will be updated during most time of the training. In this way, serves as a feature extractor that maps the input to high-level representation and adds systematical noise using dropout. We can then train using any state-of-the-art SSL algorithms such as -model, temporal ensembling, mean teacher, etc. Furthermore, this gradually unfreezing schedule also prevents a pretrained model from catastrophic forgetting. The experimental results demonstrate that our approach provides improvements when compared to state of the art methods especially on short texts.
View on arXiv