All Papers

VLM

197

03 May 2023

Looking Similar, Sounding Different: Leveraging Counterfactual
Cross-Modal Pairs for Audiovisual Representation Learning

390

12 Apr 2023

Audio-Visual Grouping Network for Sound Localization from MixturesComputer Vision and Pattern Recognition (CVPR), 2023

Shentong Mo

158

29 Mar 2023

Sound Localization from Motion: Jointly Learning Sound Direction and Camera RotationIEEE International Conference on Computer Vision (ICCV), 2023

Ziyang Chen

Shengyi Qian

236

20 Mar 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene SynthesisNeural Information Processing Systems (NeurIPS), 2023

353

04 Feb 2023

Novel-View Acoustic SynthesisComputer Vision and Pattern Recognition (CVPR), 2023

Natalia Neverova

Andrea Vedaldi

210

20 Jan 2023

iQuery: Instruments as Queries for Audio-Visual Sound SeparationComputer Vision and Pattern Recognition (CVPR), 2022

279

07 Dec 2022

MarginNCE: Robust Sound Localization with a Negative MarginIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sooyoung Park

Arda Senocak

Joon Son Chung

132

03 Nov 2022

Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source SeparationNeural Information Processing Systems (NeurIPS), 2022

Moitreya Chatterjee

Narendra Ahuja

A. Cherian

196

29 Oct 2022

A Closer Look at Weakly-Supervised Audio-Visual Source LocalizationNeural Information Processing Systems (NeurIPS), 2022

Shentong Mo

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

241

30 Aug 2022

292

20 Aug 2022

End-to-End Binaural Speech SynthesisInterspeech (Interspeech), 2022

145

08 Jul 2022

Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

306

21 May 2022

Learning Visual Styles from Audio-Visual AssociationsEuropean Conference on Computer Vision (ECCV), 2022

Hang Zhao

180

10 May 2022

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferComputer Vision and Pattern Recognition (CVPR), 2022

Li Fei-Fei

Jiajun Wu

163

105

05 Apr 2022

Localizing Visual Sounds the Easy WayEuropean Conference on Computer Vision (ECCV), 2022

Shentong Mo

Audio-Visual Fusion Layers for Event Type Aware Video Recognition

258

17 Mar 2022

Visually Supervised Speaker Detection and Localization via Microphone ArrayIEEE International Workshop on Multimedia Signal Processing (MMSP), 2021

Davide Berghi

A. Hilton

Philip J. B. Jackson

177

07 Mar 2022

In So Kweon

121

12 Feb 2022

Learning Sound Localization Better From Semantically Similar SamplesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

In So Kweon

133

07 Feb 2022

Class-aware Sounding Objects Localization via Audiovisual CorrespondenceIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

168

22 Dec 2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021

Rishabh Garg

172

21 Nov 2021

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021

188

15 Nov 2021

Structure from Silence: Learning Scene Structure from Ambient SoundConference on Robot Learning (CoRL), 2021

Ziyang Chen

Xixi Hu

Ego4D: Around the World in 3,000 Hours of Egocentric Video

157

10 Nov 2021

...

Antonio Torralba

Mingfei Yan

988

1,459

13 Oct 2021

$Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos$

Pano-AVQA: Grounded Audio-Visual Question Answering on 360

^\circ

VideosIEEE International Conference on Computer Vision (ICCV), 2021

295

105

11 Oct 2021

Visual Scene Graphs for Audio Source SeparationIEEE International Conference on Computer Vision (ICCV), 2021

208

24 Sep 2021

V-SlowFast Network for Efficient Visual Sound Separation

Binaural Audio Generation via Multi-task Learning

Esa Rahtu

230

18 Sep 2021

Sijia Li

ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos

135

02 Sep 2021

Yi Zhang

260

24 Jul 2021

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent VideosIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021

Sanchita Ghose

John J. Prevost

GAN

155

20 Jul 2021

Improving Multi-Modal Learning with Uni-Modal Teachers

Yue Wang

Hang Zhao

107

21 Jun 2021

Learning Audio-Visual DereverberationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

203

14 Jun 2021

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio GenerationAAAI Conference on Artificial Intelligence (AAAI), 2021

Yan-Bo Lin

Y. Wang

204

03 May 2021

Points2Sound: From mono to binaural audio using 3D point cloud scenesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process), 2021

289

26 Apr 2021

On the Design of Deep Priors for Unsupervised Audio RestorationInterspeech (Interspeech), 2021

V. Narayanaswamy

Jayaraman J. Thiagarajan

A. Spanias

AI4CE

120

14 Apr 2021

Visually Informed Binaural Audio Generation without Binaural AudiosComputer Vision and Pattern Recognition (CVPR), 2021

135

13 Apr 2021

Unsupervised Sound Localization via Iterative Contrastive LearningComputer Vision and Image Understanding (CVIU), 2021

Yan-Bo Lin

Hung-Yu Tseng

Hsin-Ying Lee

Yen-Yu Lin

Ming-Hsuan Yang

Francisco Rivera Valverde

170

01 Apr 2021

Robust Audio-Visual Instance DiscriminationComputer Vision and Pattern Recognition (CVPR), 2021

240

117

29 Mar 2021

Beyond Image to Depth: Improving Depth Prediction using EchoesComputer Vision and Pattern Recognition (CVPR), 2021

278

15 Mar 2021

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal KnowledgeComputer Vision and Pattern Recognition (CVPR), 2021

Juana Valeria Hurtado

Abhinav Valada

219

01 Mar 2021

Multimodality in VR: A surveyACM Computing Surveys (CSUR), 2021

Daniel Martin

Sandra Malpica

Diego F. F. Gutierrez

B. Masiá

Ana Serrano

205

117

20 Jan 2021

VisualVoice: Audio-Visual Speech Separation with Cross-Modal ConsistencyComputer Vision and Pattern Recognition (CVPR), 2021

Sound Synthesis, Propagation, and Rendering: A Survey

CVBM

448

237

08 Jan 2021

Learning Representations from Audio-Visual Spatial Alignment

352

11 Nov 2020

165

138

03 Nov 2020

Noisy Agents: Self-supervised Exploration by Predicting Auditory EventsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020

Chuang Gan

Xiaoyu Chen

Phillip Isola

Antonio Torralba

J. Tenenbaum

136

27 Jul 2020

Foley Music: Learning to Generate Music from Videos

Chuang Gan

Antonio Torralba

127

151

21 Jul 2020

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source SeparationEuropean Conference on Computer Vision (ECCV), 2020

191

20 Jul 2020

Leveraging Category Information for Single-Frame Visual Sound Source Separation

A Comprehensive Survey on Segment Anything Model for Vision and Beyond

Esa Rahtu

129

15 Jul 2020

Do We Need Sound for Sound Source Localization?Asian Conference on Computer Vision (ACCV), 2020

131

11 Jul 2020

Self-Supervised Generation of Spatial Audio for 360 Video

7 September 2018

Papers citing "Self-Supervised Generation of Spatial Audio for 360 Video"

50 / 118 papers shown

392

127

14 May 2023

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

Shentong Mo

VLM

197

03 May 2023

390

12 Apr 2023

Audio-Visual Grouping Network for Sound Localization from MixturesComputer Vision and Pattern Recognition (CVPR), 2023

Shentong Mo

158

29 Mar 2023

Sound Localization from Motion: Jointly Learning Sound Direction and Camera RotationIEEE International Conference on Computer Vision (ICCV), 2023

Ziyang Chen

Shengyi Qian

236

20 Mar 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene SynthesisNeural Information Processing Systems (NeurIPS), 2023

353

04 Feb 2023

Novel-View Acoustic SynthesisComputer Vision and Pattern Recognition (CVPR), 2023

Natalia Neverova

Andrea Vedaldi

210

20 Jan 2023

iQuery: Instruments as Queries for Audio-Visual Sound SeparationComputer Vision and Pattern Recognition (CVPR), 2022

279

07 Dec 2022

MarginNCE: Robust Sound Localization with a Negative MarginIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sooyoung Park

Arda Senocak

Joon Son Chung

132

03 Nov 2022

Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source SeparationNeural Information Processing Systems (NeurIPS), 2022

Moitreya Chatterjee

Narendra Ahuja

A. Cherian

196

29 Oct 2022

A Closer Look at Weakly-Supervised Audio-Visual Source LocalizationNeural Information Processing Systems (NeurIPS), 2022

Shentong Mo

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

241

30 Aug 2022

292

20 Aug 2022

End-to-End Binaural Speech SynthesisInterspeech (Interspeech), 2022

145

08 Jul 2022

Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

306

21 May 2022

Learning Visual Styles from Audio-Visual AssociationsEuropean Conference on Computer Vision (ECCV), 2022

Hang Zhao

180

10 May 2022

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferComputer Vision and Pattern Recognition (CVPR), 2022

Li Fei-Fei

Jiajun Wu

163

105

05 Apr 2022

Localizing Visual Sounds the Easy WayEuropean Conference on Computer Vision (ECCV), 2022

Shentong Mo

Audio-Visual Fusion Layers for Event Type Aware Video Recognition

258

17 Mar 2022

Visually Supervised Speaker Detection and Localization via Microphone ArrayIEEE International Workshop on Multimedia Signal Processing (MMSP), 2021

Davide Berghi

A. Hilton

Philip J. B. Jackson

177

07 Mar 2022

In So Kweon

121

12 Feb 2022

Learning Sound Localization Better From Semantically Similar SamplesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

In So Kweon

133

07 Feb 2022

Class-aware Sounding Objects Localization via Audiovisual CorrespondenceIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

168

22 Dec 2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021

Rishabh Garg

172

21 Nov 2021

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021

188

15 Nov 2021

Structure from Silence: Learning Scene Structure from Ambient SoundConference on Robot Learning (CoRL), 2021

Ziyang Chen

Xixi Hu

Ego4D: Around the World in 3,000 Hours of Egocentric Video

157

10 Nov 2021

...

Antonio Torralba

Mingfei Yan

988

1,459

13 Oct 2021

$Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos$

Pano-AVQA: Grounded Audio-Visual Question Answering on 360

^\circ

VideosIEEE International Conference on Computer Vision (ICCV), 2021

295

105

11 Oct 2021

Visual Scene Graphs for Audio Source SeparationIEEE International Conference on Computer Vision (ICCV), 2021

208

24 Sep 2021

V-SlowFast Network for Efficient Visual Sound Separation

Binaural Audio Generation via Multi-task Learning

Esa Rahtu

230

18 Sep 2021

Sijia Li

ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos

135

02 Sep 2021

Yi Zhang

260

24 Jul 2021

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent VideosIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021

Sanchita Ghose

John J. Prevost

GAN

155

20 Jul 2021

Improving Multi-Modal Learning with Uni-Modal Teachers

Yue Wang

Hang Zhao

107

21 Jun 2021

Learning Audio-Visual DereverberationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

203

14 Jun 2021

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio GenerationAAAI Conference on Artificial Intelligence (AAAI), 2021

Yan-Bo Lin

Y. Wang

204

03 May 2021

Points2Sound: From mono to binaural audio using 3D point cloud scenesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process), 2021

289

26 Apr 2021

On the Design of Deep Priors for Unsupervised Audio RestorationInterspeech (Interspeech), 2021

V. Narayanaswamy

Jayaraman J. Thiagarajan

A. Spanias

AI4CE

120

14 Apr 2021

Visually Informed Binaural Audio Generation without Binaural AudiosComputer Vision and Pattern Recognition (CVPR), 2021

135

13 Apr 2021

Unsupervised Sound Localization via Iterative Contrastive LearningComputer Vision and Image Understanding (CVIU), 2021

Yan-Bo Lin

Hung-Yu Tseng

Hsin-Ying Lee

Yen-Yu Lin

Ming-Hsuan Yang

Francisco Rivera Valverde

170

01 Apr 2021

Robust Audio-Visual Instance DiscriminationComputer Vision and Pattern Recognition (CVPR), 2021

240

117

29 Mar 2021

Beyond Image to Depth: Improving Depth Prediction using EchoesComputer Vision and Pattern Recognition (CVPR), 2021

278

15 Mar 2021

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal KnowledgeComputer Vision and Pattern Recognition (CVPR), 2021

Juana Valeria Hurtado

Abhinav Valada

219

01 Mar 2021

Multimodality in VR: A surveyACM Computing Surveys (CSUR), 2021

Daniel Martin

Sandra Malpica

Diego F. F. Gutierrez

B. Masiá

Ana Serrano

205

117

20 Jan 2021

VisualVoice: Audio-Visual Speech Separation with Cross-Modal ConsistencyComputer Vision and Pattern Recognition (CVPR), 2021

Sound Synthesis, Propagation, and Rendering: A Survey

CVBM

448

237

08 Jan 2021

Learning Representations from Audio-Visual Spatial Alignment

352

11 Nov 2020

165

138

03 Nov 2020

Noisy Agents: Self-supervised Exploration by Predicting Auditory EventsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020

Chuang Gan

Xiaoyu Chen

Phillip Isola

Antonio Torralba

J. Tenenbaum

136

27 Jul 2020

Foley Music: Learning to Generate Music from Videos

Chuang Gan

Antonio Torralba

127

151

21 Jul 2020

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source SeparationEuropean Conference on Computer Vision (ECCV), 2020

191

20 Jul 2020

Leveraging Category Information for Single-Frame Visual Sound Source Separation