ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.21806
24
0

Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition Across Languages

25 March 2025
Heqing Zou
Fengmao Lv
Desheng Zheng
E. Chng
D. Rajan
ArXivPDFHTML
Abstract

Multilingual speech emotion recognition aims to estimate a speaker's emotional state using a contactless method across different languages. However, variability in voice characteristics and linguistic diversity poses significant challenges for zero-shot speech emotion recognition, especially with multilingual datasets. In this paper, we propose leveraging contrastive learning to refine multilingual speech features and extend large language models for zero-shot multilingual speech emotion estimation. Specifically, we employ a novel two-stage training framework to align speech signals with linguistic features in the emotional space, capturing both emotion-aware and language-agnostic speech representations. To advance research in this field, we introduce a large-scale synthetic multilingual speech emotion dataset, M5SER. Our experiments demonstrate the effectiveness of the proposed method in both speech emotion recognition and zero-shot multilingual speech emotion recognition, including previously unseen datasets and languages.

View on arXiv
@article{zou2025_2503.21806,
  title={ Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition Across Languages },
  author={ Heqing Zou and Fengmao Lv and Desheng Zheng and Eng Siong Chng and Deepu Rajan },
  journal={arXiv preprint arXiv:2503.21806},
  year={ 2025 }
}
Comments on this paper