Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

10 July 2023

Papers citing "Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos"

7 / 7 papers shown

Title
Images that Sound: Composing Images and Sounds on a Single Canvas Ziyang Chen Daniel Geng Andrew Owens DiffM 46 8 0 20 May 2024
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 258 7,337 0 11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,017 0 13 Oct 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Bernard Ghanem 55 51 0 11 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 185 196 0 08 Jan 2021
Audio-Visual Floorplan Reconstruction Senthil Purushwalkam S. V. A. Garí V. Ithapu Carl Schissler Philip Robinson Abhinav Gupta Kristen Grauman VGen 3DV 60 41 0 31 Dec 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation Ruohan Gao Changan Chen Ziad Al-Halah Carl Schissler Kristen Grauman MDE SSL 156 83 0 04 May 2020