320

Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks

Main:10 Pages
7 Figures
Bibliography:3 Pages
16 Tables
Appendix:15 Pages
Abstract

Neural networks are widely used for image-related tasks but typically demand considerable computing power. Once a network has been trained, however, its memory- and compute-footprint can be reduced by compression. In this work, we focus on compression through tensorization and low-rank representations. Whereas classical approaches search for a low-rank approximation by minimizing an isotropic norm such as the Frobenius norm in weight-space, we use data-informed norms that measure the error in function space. Concretely, we minimize the change in the layer's output distribution, which can be expressed as (WW~)Σ1/2F\lVert (W - \widetilde{W}) \Sigma^{1/2}\rVert_F where Σ1/2\Sigma^{1/2} is the square root of the covariance matrix of the layer's input and WW, W~\widetilde{W} are the original and compressed weights. We propose new alternating least square algorithms for the two most common tensor decompositions (Tucker-2 and CPD) that directly optimize the new norm. Unlike conventional compression pipelines, which almost always require post-compression fine-tuning, our data-informed approach often achieves competitive accuracy without any fine-tuning. We further show that the same covariance-based norm can be transferred from one dataset to another with only a minor accuracy drop, enabling compression even when the original training dataset is unavailable. Experiments on several CNN architectures (ResNet-18/50, and GoogLeNet) and datasets (ImageNet, FGVC-Aircraft, Cifar10, and Cifar100) confirm the advantages of the proposed method.

View on arXiv
Comments on this paper