Large-scale datasets with multi-labels are becoming readily available, and the demand for large-scale multi-label classification algorithm is also increasing. In this work, we investigate limitations of a neural network (NN) architecture that aims at minimizing pairwise ranking error, and propose to utilize a rather simple NN approach in large-scale multi-label text classification tasks with recently proposed learning techniques. Additionally, we present a simple threshold predictor in order to make real-valued NN outputs binary. Our experimental results show that the simple NN models equipped with recent advanced techniques such as rectified linear units, dropout, and adagrad performs as well as or even outperforms the previous state-of-the-art approaches on six large-scale textual datasets with diverse characteristics.
View on arXiv