Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.05698
Cited By
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
11 January 2024
Licai Sun
Zheng Lian
Bin Liu
Jianhua Tao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition"
13 / 13 papers shown
Title
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection
Hao Cheng
Zhiwei Zhao
Yichao He
Zhenzhen Hu
Jia Li
M. Wang
Richang Hong
26
0
0
05 May 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
38
0
0
25 Apr 2025
SVFAP: Self-supervised Video Facial Affect Perceiver
Licai Sun
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
30
14
0
31 Dec 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
33
13
0
19 Sep 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
45
45
0
09 Mar 2023
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
44
20
0
27 Sep 2022
Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?
Vandana Rajan
A. Brutti
Andrea Cavallaro
21
26
0
18 Feb 2022
Self-attention fusion for audiovisual emotion recognition with incomplete data
K. Chumachenko
Alexandros Iosifidis
M. Gabbouj
58
30
0
26 Jan 2022
A Pre-trained Audio-Visual Transformer for Emotion Recognition
Minh Tran
M. Soleymani
44
25
0
23 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
5,353
0
11 Nov 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
206
1,954
0
14 Jun 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
136
1,403
0
06 Jun 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
226
9,999
0
18 May 2015
1