274

ActionFlowNet: Learning Motion Representation for Action Recognition

Abstract

Even with the recent advances in convolutional neural networks (CNN) in various visual recognition tasks, the state-of-the-art action recognition system still relies on hand crafted motion feature such as optical flow to achieve the best performance. We propose a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow while recognizing actions with convolutional neural networks, capturing both appearance and motion in a single model. We additionally provide insights to how the quality of the learned optical flow affects the action recognition. Our model not only significantly improves action recognition accuracy by a large margin (17%) compared to state-of-the-art CNN-based action recognition models trained without external large scale data and additional optical flow input, but also produces the optical flow as a side product.

View on arXiv
Comments on this paper