540

Shapeshifter Networks: Decoupling Layers from Parameters for Scalable and Effective Deep Learning

Abstract

Fitting a model into GPU memory during training is an increasing concern as models continue to grow. To address this issue, we present Shapeshifter Networks (SSNs), a flexible neural network framework that decouples layers from model weights, enabling us to implement any neural network with an arbitrary number of parameters. In SSNs each layer obtains weights from a parameter store that decides where and how to allocate parameters to layers. This can result in sharing parameters across layers even when they have different sizes or perform different operations. SSNs do not require any modifications to a model's loss function or architecture, making them easy to use. Our approach can create parameter efficient networks by using a relatively small number of weights, or can improve a model's performance by adding additional model capacity during training without affecting the computational resources required at test time. We evaluate SSNs using seven network architectures across diverse tasks that include image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters.

View on arXiv
Comments on this paper