273

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

Abstract

Data-driven saliency has recently gained a lot of attention thanks to the use of Convolutional Neural Networks. In this paper we go beyond the standard approach to saliency prediction, in which gaze maps are computed with a feed-forward network, and we present a novel Saliency Attentive Model which can predict accurate saliency maps by incorporating attentive mechanisms. Our solution is composed of a Convolutional LSTM, that iteratively focuses on the most salient regions of the input, and a Residual Architecture designed to preserve spatial resolution. Additionally, to tackle the center bias present in human eye fixations, our model incorporates prior maps generated by learned Gaussian functions. We show, through an extensive evaluation, that the proposed architecture overcomes the current state of the art on three public saliency prediction datasets: SALICON, MIT300 and CAT2000. We further study the contribution of each key components to demonstrate their robustness on different scenarios.

View on arXiv
Comments on this paper