ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.03568
  4. Cited By
A vector quantized masked autoencoder for audiovisual speech emotion recognition

A vector quantized masked autoencoder for audiovisual speech emotion recognition

5 May 2023
Samir Sadok
Simon Leglaive
Renaud Séguier
    SSL
ArXivPDFHTML

Papers citing "A vector quantized masked autoencoder for audiovisual speech emotion recognition"

10 / 10 papers shown
Title
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
62
0
0
20 Feb 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
40
1
0
10 Jan 2025
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche
Simon Leglaive
Xavier Alameda-Pineda
Francesc Moreno-Noguer
3DH
41
1
0
29 May 2024
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised
  Audio-Visual Emotion Recognition
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
Licai Sun
Zheng Lian
Bin Liu
Jianhua Tao
41
29
0
11 Jan 2024
Self-attention fusion for audiovisual emotion recognition with
  incomplete data
Self-attention fusion for audiovisual emotion recognition with incomplete data
K. Chumachenko
Alexandros Iosifidis
M. Gabbouj
65
37
0
26 Jan 2022
A Pre-trained Audio-Visual Transformer for Emotion Recognition
A Pre-trained Audio-Visual Transformer for Emotion Recognition
Minh Tran
M. Soleymani
54
25
0
23 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
7,337
0
11 Nov 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
229
573
0
22 Apr 2021
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
229
3,029
0
09 Mar 2020
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
206
1,954
0
14 Jun 2018
1