M&M Mix: A Multimodal Multiview Transformer Ensemble

M&M Mix: A Multimodal Multiview Transformer Ensemble

20 June 2022

Cordelia Schmid

ArXiv (abs)PDF HTML

Papers citing "M&M Mix: A Multimodal Multiview Transformer Ensemble"

18 / 18 papers shown

Title
Improving Keystep Recognition in Ego-Video via Dexterous Focus Zachary Chavis Stephen J. Guy Hyun Soo Park 232 1 0 01 Jun 2025
Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities Maria Santos-Villafranca Dustin Carrión-Ojeda Alejandro Pérez-Yus J. Bermudez-Cameo Jose J. Guerrero Simone Schaub-Meyer EgoV VLM 308 0 0 11 Apr 2025
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction DatasetsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025 Tanay Agrawal Mohammed Guermal Michal Balazia François Brémond 185 0 0 08 Jan 2025
Sensitive Image Classification by Vision TransformersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024 Hanxian He Campbell Wilson Thanh Thi Nguyen Janis Dalins ViT 263 1 0 21 Dec 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition Jacob Chalk Jaesung Huh Evangelos Kazakos Andrew Zisserman Dima Damen 258 24 0 08 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization Anna Kukleva Fadime Sener Edoardo Remelli Bugra Tekin Eric Sauser Bernt Schiele Shugao Ma VLM EgoV 140 4 0 28 Mar 2024
Training a Large Video Model on a Single Machine in a Day Yue Zhao Philipp Krahenbuhl VLM 221 22 0 28 Sep 2023
IndGIC: Supervised Action Recognition under Low Illumination Jing-Teng Zeng 165 3 0 29 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding Mona Ahmadian Frank Guerin Andrew Gilbert 227 4 0 23 Aug 2023
An Outlook into the Future of Egocentric VisionInternational Journal of Computer Vision (IJCV), 2023 Chiara Plizzari Gabriele Goletto Antonino Furnari Siddhant Bansal Francesco Ragusa G. Farinella Dima Damen Tatiana Tommasi EgoV 242 72 0 14 Aug 2023
Multimodal Distillation for Egocentric Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023 Gorjan Radevski Dusan Grujicic Marie-Francine Moens Matthew Blaschko Tinne Tuytelaars EgoV 255 34 0 14 Jul 2023
Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023 Yuqi Li Yi-Jhen Luo Xiaoshuai Hao Chuanguang Yang Zhulin An Dantong Song Wei Yi 130 0 0 15 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition Shreyank N. Gowda Anurag Arnab Jonathan Huang ViT 162 4 0 07 Jun 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric PerspectiveNeurocomputing (Neurocomputing), 2023 Thanh-Dat Truong Khoa Luu EgoV 365 15 0 25 May 2023
Epic-Sounds: A Large-scale Dataset of Actions That SoundIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Jaesung Huh Jacob Chalk Evangelos Kazakos Dima Damen Andrew Zisserman EgoV 269 55 0 01 Feb 2023
Deep Architectures for Content Moderation and Movie Content Rating Fatih Çagatay Akyön A. Temi̇zel 159 8 0 08 Dec 2022
Students taught by multimodal teachers are superior action recognizers Gorjan Radevski Dusan Grujicic Matthew Blaschko Marie-Francine Moens Tinne Tuytelaars 187 2 0 09 Oct 2022
Vision Transformers for Action Recognition: A Survey Anwaar Ulhaq Naveed Akhtar Ganna Pogrebna Lin Wang ViT 185 62 0 13 Sep 2022