DARC: Differentiable ARchitecture Compression

In many learning situations, resources at inference time are significantly more constrained than resources at training time. This paper studies a general paradigm, called Differentiable ARchitecture Compression (DARC), that combines model compression and architecture search to learn models that are resource-efficient at inference time. Given a resource-intensive base architecture, DARC utilizes the training data to learn which sub-components can be replaced by cheaper alternatives. The high-level technique can be applied to any neural architecture, and we report experiments on state-of-the-art convolutional neural networks for image classification. For a WideResNet with accuracy on CIFAR-10, we improve single-sample inference speed by and memory footprint by , with no accuracy loss. For a ResNet with Top1 accuracy on ImageNet, we improve batch inference speed by and memory footprint by with accuracy loss. We also give theoretical Rademacher complexity bounds in simplified cases, showing how DARC avoids overfitting despite over-parameterization.
View on arXiv