Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.15813
Cited By
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
22 May 2024
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From CNNs to Transformers in Multimodal Human Action Recognition: A Survey"
8 / 8 papers shown
Title
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
19
4
0
11 Sep 2022
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
D. Curto
Albert Clapés
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
...
G. Guilera
D. Leiva
T. Moeslund
Sergio Escalera
Cristina Palmero
20
25
0
20 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
229
573
0
22 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
3,790
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
272
1,939
0
09 Feb 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
396
532
0
21 Jul 2020
Gimme Signals: Discriminative signal encoding for multimodal activity recognition
Raphael Memmesheimer
Nick Theisen
Dietrich Paulus
33
50
0
13 Mar 2020
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
170
3,480
0
20 Aug 2019
1