Trusting SVM for Piecewise Linear CNNs

7 November 2016

Abstract

We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks (PL-CNNs), a large class of convolutional neural networks. Specifically, PL-CNNs employ piecewise linear non-linearities such as the commonly used ReLU and max-pool, and an SVM classifier as the final layer. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program, which happens to be a latent structured SVM. We optimize the DC program using the concave-convex procedure, which requires us to iteratively solve a structured SVM problem. To this end, we extend the block-coordinate Frank-Wolfe (BCFW) algorithm in three important ways: (i) we include a trust-region for the parameters, which allows us to use the previous parameters as an initialization; (ii) we reduce the memory requirement of BCFW by potentially several orders of magnitude for the dense layers, which enables us to learn a large set of parameters; and (iii) we observe that, empirically, the optimal solution of the structured SVM problem can be obtained efficiently by solving a subproblem which contains only a small fraction of the constraints. Using publicly available data sets, we show that our approach outperforms the state of the art variants of backpropagation for learning PL-CNNs.

View on arXiv

Comments on this paper