Model Fusion via Optimal Transport
- MoMeFedML
Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We discuss two main strategies for fusing neural networks in this "one-shot" manner, without requiring any retraining. Next, we illustrate how this significantly outperforms vanilla averaging on convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons, on CIFAR10 and MNIST. Finally, we show applications to transfer tasks (where our fused model even surpasses the performance of both the original models) as well as for compressing models. Code will be made available under the following link https://github.com/modelfusion.
View on arXiv