37
0

Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition

Abstract

This article presents our results for the eighth Affective Behavior Analysis in-the-wild (ABAW)this http URLemotion recognition (ER) has important applications in affective computing and human-computer interaction. However, in the real world, compound emotion recognition faces greater issues of uncertainty and modal conflicts. For the Compound Expression (CE) Recognition Challenge,this paper proposes a multimodal emotion recognition method that fuses the features of Vision Transformer (ViT) and Residual Network (ResNet). We conducted experiments on the C-EXPR-DB and MELD datasets. The results show that in scenarios with complex visual and audio cues (such as C-EXPR-DB), the model that fuses the features of ViT and ResNet exhibits superiorthis http URLcode are avalible onthis https URL

View on arXiv
@article{liu2025_2503.17453,
  title={ Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition },
  author={ Ran Liu and Fengyu Zhang and Cong Yu and Longjiang Yang and Zhuofan Wen and Siyuan Zhang and Hailiang Yao and Shun Chen and Zheng Lian and Bin Liu },
  journal={arXiv preprint arXiv:2503.17453},
  year={ 2025 }
}
Comments on this paper