ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.05858
36
0

Bimodal Connection Attention Fusion for Speech Emotion Recognition

8 March 2025
Jiachen Luo
Huy Phan
Lin Wang
Joshua D. Reiss
ArXivPDFHTML
Abstract

Multi-modal emotion recognition is challenging due to the difficulty of extracting features that capture subtle emotional differences. Understanding multi-modal interactions and connections is key to building effective bimodal speech emotion recognition systems. In this work, we propose Bimodal Connection Attention Fusion (BCAF) method, which includes three main modules: the interactive connection network, the bimodal attention network, and the correlative attention network. The interactive connection network uses an encoder-decoder architecture to model modality connections between audio and text while leveraging modality-specific features. The bimodal attention network enhances semantic complementation and exploits intra- and inter-modal interactions. The correlative attention network reduces cross-modal noise and captures correlations between audio and text. Experiments on the MELD and IEMOCAP datasets demonstrate that the proposed BCAF method outperforms existing state-of-the-art baselines.

View on arXiv
@article{luo2025_2503.05858,
  title={ Bimodal Connection Attention Fusion for Speech Emotion Recognition },
  author={ Jiachen Luo and Huy Phan and Lin Wang and Joshua D. Reiss },
  journal={arXiv preprint arXiv:2503.05858},
  year={ 2025 }
}
Comments on this paper