Writer Identification and Writer Retrieval Based on NetVLAD with Re-ranking

This paper addresses writer identification and writer retrieval which is considered as a challenging problem in the document analysis and recognition field. In this work, a novel pipeline is proposed for the problem at hand by employing a unified neural network architecture consisting of the ResNet-20 as a feature extractor and an integrated NetVLAD layer, inspired by the vector of locally aggregated descriptors (VLAD), in the head of the latter part. Having defined this architecture, the triplet semi-hard loss function is used to directly learn an embedding for individual input image patches. Subsequently, generalized max-pooling technique is employed for the aggregation of embedded descriptors of each handwritten image. Also, a novel re-ranking strategy is introduced for the task of identification and retrieval based on -reciprocal nearest neighbors, and it is shown that the pipeline can benefit tremendously from this step. Experimental evaluation has been done on the three publicly available datasets: the ICDAR 2013, CVL, and KHATT datasets. Results indicate that while we perform comparably to the state-of-the-art on the KHATT, our writer identification and writer retrieval pipeline achieves superior performance on the ICDAR 2013 and CVL datasets in terms of mAP.
View on arXiv