Model Fusion via Optimal Transport

Neural Information Processing Systems (NeurIPS), 2019

12 October 2019

ArXiv (abs)PDF HTML Github (145★)

Abstract

Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We discuss two main strategies for fusing neural networks in this "one-shot" manner, without requiring any retraining. Next, we illustrate how this significantly outperforms vanilla averaging on convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons, on CIFAR10 and MNIST. Finally, we show applications to transfer tasks (where our fused model even surpasses the performance of both the original models) as well as for compressing models. Code will be made available under the following link https://github.com/modelfusion.

View on arXiv

Comments on this paper