Soft Diamond Regularizers for Deep Learning
This chapter presents the new family of soft diamond synaptic regularizers based on thick-tailed symmetric alpha stable probability bell curves. These new parametrized weight priors improved deep-learning performance on image and language-translation test sets and increased the sparsity of the trained weights. They outperformed the state-of-the-art hard-diamond Laplacian regularizer of sparse lasso regression and classification. The synaptic weight priors have power-law bell-curve tails that are thicker than the thin exponential tails of Gaussian bell curves that underly ridge regularizers. Their tails get thicker as the parameter decreases. These thicker tails model more impulsive behavior and allow for occasional distant search in synaptic weight spaces of extremely high dimension. The geometry of their constraint sets has a diamond shape. The shape varies from a circle to a star or diamond that depends on the tail thickness and dispersion of the weight prior. These bell curves lack a closed form in general and this makes direct training computationally intensive. We removed this computational bottleneck by using a precomputed look-up table. We tested the soft diamond regularizers with deep neural classifiers on both image test sets and German-to-English language translation. The image simulations used the three datasets CIFAR-10, CIFAR-100, and Caltech-256. The regularizers improved the accuracy and sparsity of the classifiers. We also tested with deep neural machine-translation models on the IWSLT-2016 Evaluation dataset for German-to-English text translation. They also outperformed ridge regularizers and lasso regularizers. These findings recommend the sub-Cauchy soft diamond regularizer as a competitive and sparse regularizer for large-scale machine learning.
View on arXiv