This master thesis describes an algorithm for automated categorization of scientific documents using deep learning techniques and compares the results to the results of existing classification algorithms. As an additional goal a reusable API is to be developed allowing the automation of classification tasks in existing software. A design will be proposed using a convolutional neural network as a classifier and integrating this into a REST based API. This is then used as the basis for an actual proof of concept implementation presented as well in this thesis. It will be shown that the deep learning classifier provides very good result in the context of multi-class document categorization and that it is feasible to integrate such classifiers into a larger ecosystem using REST based services.
View on arXiv