Title
Learning State-Aware Visual Representations from Audible Interactions Himangi Mittal Pedro Morgado Unnat Jain Abhinav Gupta 52 20 0 27 Sep 2022
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 255 5,353 0 11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 212 682 0 13 Oct 2021
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 206 1,954 0 14 Jun 2018
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 138 782 0 16 Nov 2016