DeepRebirth: Accelerating Deep Neural Network Execution on Mobile
Devices
Deploying deep neural networks on mobile devices is a challenging task due to computation complexity and memory intensity. Current model reduction methods (e.g., matrix approximation using SVD) cannot satisfy real-time processing requirement. This paper first discovers that the major obstacle is the excessive execution time of non-tensor layers (with tensor-like parameters) such as pooling and normalization. This motivates us to design a novel acceleration framework: DeepRebirth through "slimming" existing consecutive and parallel non-tensor and tensor layers. The layer slimming is executed at different substructures: (a) streamline slimming by merging the consecutive non-tensor and tensor layer vertically; (b) branch slimming by merging non-tensor and tensor branches horizontally. These different optimization operations accelerate the model execution and reduce the run-time memory cost significantly. To maximally avoid accuracy loss, the parameters in new generated layers are learned with layer-wise fine-tuning based on both theoretical analysis and empirical verification. As observed in the experiment, DeepRebirth achieves 3x-5x speed-up and energy saving on GoogLeNet with only 0.4% accuracy drop on top-5 categorization in ImageNet. Further, by combining with other model compression techniques, DeepRebirth offers an average of 65ms model forwarding time on a single image using Samsung Galaxy S6 with 86.54% top-5 accuracy with 2.5x run-time memory saving.
View on arXiv