v1v2 (latest)

Uncertainty-Aware Collaborative System of Large and Small Models for Multimodal Sentiment Analysis

27 August 2025

ArXiv (abs)PDF HTML Github

Main:10 Pages

5 Figures

Bibliography:3 Pages

9 Tables

Abstract

Multimodal Large Language Models (MLLMs) have notably enhanced the performance of Multimodal Sentiment Analysis (MSA), yet their massive parameter scale leads to excessive resource consumption in training and inference, severely limiting model efficiency. To balance performance and efficiency for MSA, this paper innovatively proposes a novel Uncertainty-Aware Collaborative System (U-ACS) that integrates Uncertainty-aware Baseline Model (UBM) with MLLMs. U-ACS operates in three stages: First, all samples are processed by the UBM, retain high-confidence samples and forward low-confidence samples to the MLLM. Notably, to address the challenge that continuous outputs of regression tasks hinder uncertainty calculation, we innovatively convert the continuous sentiment label prediction task to a classification task, enabling a more accurate calculation of entropy and uncertainty. Second, the MLLM performs initial process. In this stage, high-confidence samples or low-confidence samples whose predictive sentiment polarity matches that of the UBM are deemed acceptable, while unqualified samples are forwarded for further processing. Finally, the MLLM performs secondary inference on remaining low-confidence samples using prompts augmented with prior rounds predictions as references. By aggregating results from the three stages, U-ACS preserves high MSA prediction accuracy while drastically boosting efficiency via offloading most simple samples to the UBM and minimizing MLLM processing volume. Extensive experiments verify that U-ACS maintains superior performance while significantly reducing computational overhead and resource consumption.

View on arXiv

Comments on this paper