There is More than Meets the Eye: Self-Supervised Multi-Object Detection
and Tracking with Sound by Distilling Multimodal Knowledge

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

1 March 2021

Francisco Rivera Valverde

Juana Valeria Hurtado

Abhinav Valada

Papers citing "There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge"

8 / 8 papers shown

Title
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness Yizhuo Yang Shenghai Yuan Muqing Cao Jianfei Yang Lihua Xie 49 7 0 11 Nov 2024
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions Jinzheng Zhao Yong-mei Xu Xinyuan Qian Davide Berghi Peipei Wu Meng Cui Jianyuan Sun Philip J. B. Jackson Wenwu Wang BDL 32 7 0 23 Oct 2023
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight Yunhua Zhang Hazel Doughty Cees G. M. Snoek VLM 26 0 0 05 Dec 2022
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization Yuanyuan Jiang Jianqin Yin Yonghao Dang 27 4 0 11 Oct 2022
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds Wu Zheng Ming-Hong Hong Li Jiang Chi-Wing Fu 3DPC 14 29 0 30 Jun 2022
Amodal Panoptic Segmentation Rohit Mohan Abhinav Valada 9 43 0 23 Feb 2022
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 192 204 0 23 Jan 2020
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 261 10,106 0 16 Nov 2016