Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.18552
Cited By
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
20 February 2025
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention"
9 / 9 papers shown
Title
Self-attention fusion for audiovisual emotion recognition with incomplete data
K. Chumachenko
Alexandros Iosifidis
M. Gabbouj
65
37
0
26 Jan 2022
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mañdziuk
DiffM
38
12
0
27 Sep 2021
Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition
Zhengyao Wen
Wen-Long Lin
Tao Wang
Ge Xu
CVBM
91
204
0
15 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
229
573
0
22 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
138
166
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
298
771
0
18 Apr 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
396
532
0
21 Jul 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
178
204
0
23 Jan 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
136
1,458
0
06 Jun 2016
1