Solution for 8th Competition on Affective & Behavior Analysis in-the-wild
In this report, we present our solution for the Action Unit (AU) Detection Challenge, in 8th Competition on Affective Behavior Analysis in-the-wild. In order to achieve robust and accurate classification of facial action unit in the wild environment, we introduce an innovative method that leverages audio-visual multimodal data. Our method employs ConvNeXt as the image encoder and uses Whisper to extract Mel spectrogram features. For these features, we utilize a Transformer encoder-based feature fusion module to integrate the affective information embedded in audio and image features. This ensures the provision of rich high-dimensional feature representations for the subsequent multilayer perceptron (MLP) trained on the Aff-Wild2 dataset, enhancing the accuracy of AU detection.
View on arXiv@article{yu2025_2503.11115, title={ Solution for 8th Competition on Affective & Behavior Analysis in-the-wild }, author={ Jun Yu and Yunxiang Zhang and Xilong Lu and Yang Zheng and Yongqi Wang and Lingsi Zhu }, journal={arXiv preprint arXiv:2503.11115}, year={ 2025 } }