Unconstrained text recognition is a stimulating field in the branch of pattern recognition. This field is still an open search due to the unlimited vocabulary, multi styles, mixed-font and their great morphological variability. Recent trends show a potential improvement of recognition by adoption a novel representation of extracted features. In the present paper, we propose a novel feature extraction model by learning a Bag of Features Framework for text recognition based on Sparse Auto-Encoder. The Hidden Markov Models are then used for sequences modeling. For features learned quality evaluation, our proposed system was tested on two printed text datasets PKHATT text line images and APTI word images benchmark. Our method achieves promising recognition on both datasets.
View on arXiv