10
0

Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders

Abstract

Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.

View on arXiv
@article{mancisidor2025_2505.01134,
  title={ Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders },
  author={ Rogelio A Mancisidor and Robert Jenssen and Shujian Yu and Michael Kampffmeyer },
  journal={arXiv preprint arXiv:2505.01134},
  year={ 2025 }
}
Comments on this paper