CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

15 December 2016

Kai Xu

Abstract

In this paper, we develop a deep neural network architecture called "CSVideoNet" that can learn visual representations from random measurements for compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end trainable and non-iterative model that combines convolutional neural networks (CNNs) with a recurrent neural networks (RNN) to facilitate video reconstruction by leveraging temporal-spatial features. The proposed network can accept random measurements with a multi-level compression ratio (CR). The lightly and aggressively compressed measurements offer background information and object details, respectively. This is similar to the variable bit rate techniques widely used in conventional video coding approaches. The RNN employed by CSVideoNet can leverage temporal coherence that exists in adjacent video frames to extrapolate motion features and merge them with spatial visual features extracted by the CNNs to further enhance reconstruction quality, especially at high CRs. We test our CSVideoNet on the UCF-101 dataset. Experimental results show that CSVideoNet outperforms the existing video CS reconstruction approaches. The results demonstrate that our method can preserve relatively excellent visual details from original videos even at a 100x CR, which is difficult to realize with the reference approaches. Also, the non-iterative nature of CSVideoNet results in an decrease in runtime by three orders of magnitude over iterative reconstruction algorithms. Furthermore, CSVideoNet can enhance the CR of CS cameras beyond the limitation of conventional approaches, ensuring a reduction in bandwidth for data transmission. These benefits are especially favorable to high-frame-rate video applications.

View on arXiv

Comments on this paper