Learning to Localize Sound Source in Visual Scenes

10 March 2018

Arda Senocak

Tae-Hyun Oh

Junsik Kim

Ming-Hsuan Yang

In So Kweon

SSL

ArXiv PDF HTML

Papers citing "Learning to Localize Sound Source in Visual Scenes"

40 / 90 papers shown

Title
Unsupervised Sound Localization via Iterative Contrastive Learning Yan-Bo Lin Hung-Yu Tseng Hsin-Ying Lee Yen-Yu Lin Ming-Hsuan Yang SSL 32 34 0 01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning Adrià Recasens Pauline Luc Jean-Baptiste Alayrac Luyu Wang Ross Hemsley ... Florent Altché M. Valko Jean-Bastien Grill Aaron van den Oord Andrew Zisserman SSL AI4TS 33 127 0 30 Mar 2021
Read and Attend: Temporal Localisation in Sign Language Videos Gül Varol Liliane Momeni Samuel Albanie Triantafyllos Afouras Andrew Zisserman SLR 24 40 0 30 Mar 2021
Robust Audio-Visual Instance Discrimination Pedro Morgado Ishan Misra Nuno Vasconcelos SSL 22 110 0 29 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde Juana Valeria Hurtado Abhinav Valada 28 72 0 01 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal Generation Ye Zhu Yu Wu Hugo Latapie Yi Yang Yan Yan SSL 44 20 0 05 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 196 199 0 08 Jan 2021
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction Samyak Jain P. Yarlagadda Shreyank Jyoti Shyamgopal Karthik Subramanian Ramanathan Vineet Gandhi ViT 29 66 0 11 Dec 2020
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things Jing Zhang Dacheng Tao 45 463 0 17 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment Pedro Morgado Yi Li Nuno Vasconcelos SSL 27 121 0 03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds Efthymios Tzinis Scott Wisdom A. Jansen Shawn Hershey Tal Remez D. Ellis J. Hershey 39 69 0 02 Nov 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention Bin Duan Hao Tang Wei Wang Ziliang Zong Guowei Yang Yan Yan 33 59 0 14 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video Triantafyllos Afouras Andrew Owens Joon Son Chung Andrew Zisserman SSL 19 253 0 10 Aug 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing Yapeng Tian Dingzeyu Li Chenliang Xu 34 181 0 21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation Hang Zhou Xudong Xu Dahua Lin Xiaogang Wang Ziwei Liu DiffM 32 81 0 20 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine Rui Qian Di Hu Heinrich Dinkel Mengyue Wu N. Xu Weiyao Lin 28 155 0 13 Jul 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound Karren D. Yang Bryan C. Russell Justin Salamon SSL 24 75 0 11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network Lingyu Zhu Esa Rahtu 22 23 0 04 Jun 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation Ruohan Gao Changan Chen Ziad Al-Halah Carl Schissler Kristen Grauman MDE SSL 171 84 0 04 May 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning Sanchita Ghose John J. Prevost VGen 22 46 0 21 Feb 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 197 207 0 23 Jan 2020
Deep Audio-Visual Learning: A Survey Hao Zhu Mandi Luo Rui Wang A. Zheng Ran He 31 156 0 14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network A. Tsiami Petros Koutras Petros Maragos 27 73 0 09 Jan 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation Chuang Gan Yiwei Zhang Jiajun Wu Boqing Gong J. Tenenbaum 24 137 0 25 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications Arda Senocak Tae-Hyun Oh Junsik Kim Ming-Hsuan Yang In So Kweon SSL 33 52 0 20 Nov 2019
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 35 88 0 24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos Kranti K. Parida Neeraj Matiyali T. Guha Gaurav Sharma VLM 35 41 0 19 Oct 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning Tanzila Rahman Bicheng Xu Leonid Sigal 30 78 0 22 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net Xudong Xu Bo Dai Dahua Lin 35 91 0 30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen EgoV 16 332 0 22 Aug 2019
Speech2Face: Learning the Face Behind a Voice Tae-Hyun Oh Tali Dekel Changil Kim Inbar Mosseri William T. Freeman Michael Rubinstein Wojciech Matusik SSL CVBM 33 163 0 23 May 2019
Audio-Visual Model Distillation Using Acoustic Images Andrés F. Pérez Valentina Sanguineti Pietro Morerio Vittorio Murino VLM 15 27 0 16 Apr 2019
Co-Separating Sounds of Visual Objects Ruohan Gao Kristen Grauman 33 206 0 16 Apr 2019
2.5D Visual Sound Ruohan Gao Kristen Grauman VGen 27 130 0 11 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian Chenxiao Guan Justin Goodman Marc Moore Chenliang Xu 36 20 0 07 Dec 2018
Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion Suwon Shon Tae-Hyun Oh James R. Glass 19 50 0 27 Nov 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features Andrew Owens Alexei A. Efros SSL 51 745 0 10 Apr 2018
The Sound of Pixels Hang Zhao Chuang Gan Andrew Rouditchenko Carl Vondrick Josh H. McDermott Antonio Torralba VLM 22 529 0 09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu 53 426 0 23 Mar 2018
Objects that Sound Relja Arandjelović Andrew Zisserman ObjD VOS 44 528 0 18 Dec 2017