ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.09852
  4. Cited By
M&M Mix: A Multimodal Multiview Transformer Ensemble

M&M Mix: A Multimodal Multiview Transformer Ensemble

20 June 2022
Xuehan Xiong
Anurag Arnab
Arsha Nagrani
Cordelia Schmid
    ViT
ArXiv (abs)PDFHTML

Papers citing "M&M Mix: A Multimodal Multiview Transformer Ensemble"

18 / 18 papers shown
Title
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Zachary Chavis
Stephen J. Guy
Hyun Soo Park
232
1
0
01 Jun 2025
Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities
Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities
Maria Santos-Villafranca
Dustin Carrión-Ojeda
Alejandro Pérez-Yus
J. Bermudez-Cameo
Jose J. Guerrero
Simone Schaub-Meyer
EgoVVLM
308
0
0
11 Apr 2025
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction DatasetsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Tanay Agrawal
Mohammed Guermal
Michal Balazia
François Brémond
185
0
0
08 Jan 2025
Sensitive Image Classification by Vision Transformers
Sensitive Image Classification by Vision TransformersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Hanxian He
Campbell Wilson
Thanh Thi Nguyen
Janis Dalins
ViT
263
1
0
21 Dec 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
258
24
0
08 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLMEgoV
140
4
0
28 Mar 2024
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
221
22
0
28 Sep 2023
IndGIC: Supervised Action Recognition under Low Illumination
IndGIC: Supervised Action Recognition under Low Illumination
Jing-Teng Zeng
165
3
0
29 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
227
4
0
23 Aug 2023
An Outlook into the Future of Egocentric Vision
An Outlook into the Future of Egocentric VisionInternational Journal of Computer Vision (IJCV), 2023
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
242
72
0
14 Aug 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
255
34
0
14 Jul 2023
Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction
  Recognition Challenge 2023
Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023
Yuqi Li
Yi-Jhen Luo
Xiaoshuai Hao
Chuanguang Yang
Zhulin An
Dantong Song
Wei Yi
130
0
0
15 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action
  Recognition
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
162
4
0
07 Jun 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric PerspectiveNeurocomputing (Neurocomputing), 2023
Thanh-Dat Truong
Khoa Luu
EgoV
365
15
0
25 May 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That SoundIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
269
55
0
01 Feb 2023
Deep Architectures for Content Moderation and Movie Content Rating
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
159
8
0
08 Dec 2022
Students taught by multimodal teachers are superior action recognizers
Students taught by multimodal teachers are superior action recognizers
Gorjan Radevski
Dusan Grujicic
Matthew Blaschko
Marie-Francine Moens
Tinne Tuytelaars
187
2
0
09 Oct 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Lin Wang
ViT
185
62
0
13 Sep 2022
1