
We propose a technique to augment network layers by adding a linear gating mechanism, which provides a way to learn identity mappings by optimizing only one parameter. We also introduce a new metric which served as basis for the technique. It captures the difficulty involved in learning identity mappings for different types of network models, and provides a new theoretical intuition for the increased depths of models such as Highway and Residual Networks. We propose a new model, the Gated Residual Network, which is the result when augmenting Residual Networks. Experimental results show that augmenting layers grants increased performance, less issues with depth, and more layer independence -- fully removing them does not cripple the model. We evaluate our method on MNIST using fully-connected networks and on CIFAR-10 using Wide ResNets, achieving a relative error reduction of more than 8% in the latter when compared to the original model.
View on arXiv