Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders

2 May 2025

Abstract

Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.

View on arXiv

@article{mancisidor2025_2505.01134,
  title={ Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders },
  author={ Rogelio A Mancisidor and Robert Jenssen and Shujian Yu and Michael Kampffmeyer },
  journal={arXiv preprint arXiv:2505.01134},
  year={ 2025 }
}

Comments on this paper