Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks, particularly semantic segmentation, is critically important yet remains a significant challenge. One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities, a phenomenon referred to as unimodal dominance or bias. This issue becomes especially problematic in real-world scenarios where the dominant modality may be unavailable, resulting in severe performance degradation. To this end, we apply a simple but effective plug-and-play regularization term based on functional entropy, which introduces no additional parameters or modules. This term is designed to intuitively balance the contribution of each visual modality to the segmentation results. Specifically, we leverage the log-Sobolev inequality to bound functional entropy using functional-Fisher-information. By maximizing the information contributed by each visual modality, our approach mitigates unimodal dominance and establishes a more balanced and robust segmentation framework. A multi-scale regularization module is proposed to apply our proposed plug-and-play term on high-level features and also segmentation predictions for more balanced multi-modal learning. Extensive experiments on three datasets demonstrate that our proposed method achieves superior performance, i.e., +13.94%, +3.25%, and +3.64%, without introducing any additional parameters.
View on arXiv@article{zheng2025_2505.06635, title={ Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization }, author={ Xu Zheng and Yuanhuiyi Lyu and Lutao Jiang and Danda Pani Paudel and Luc Van Gool and Xuming Hu }, journal={arXiv preprint arXiv:2505.06635}, year={ 2025 } }