ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.03849
  4. Cited By
Learning to Localize Sound Source in Visual Scenes

Learning to Localize Sound Source in Visual Scenes

10 March 2018
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
    SSL
ArXivPDFHTML

Papers citing "Learning to Localize Sound Source in Visual Scenes"

40 / 90 papers shown
Title
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
32
34
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
Read and Attend: Temporal Localisation in Sign Language Videos
Read and Attend: Temporal Localisation in Sign Language Videos
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
SLR
24
40
0
30 Mar 2021
Robust Audio-Visual Instance Discrimination
Robust Audio-Visual Instance Discrimination
Pedro Morgado
Ishan Misra
Nuno Vasconcelos
SSL
22
110
0
29 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection
  and Tracking with Sound by Distilling Multimodal Knowledge
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
28
72
0
01 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal
  Generation
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
44
20
0
05 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
29
66
0
11 Dec 2020
Empowering Things with Intelligence: A Survey of the Progress,
  Challenges, and Opportunities in Artificial Intelligence of Things
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things
Jing Zhang
Dacheng Tao
45
463
0
17 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
27
121
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of
  On-Screen Sounds
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
39
69
0
02 Nov 2020
Audio-Visual Event Localization via Recursive Fusion by Joint
  Co-Attention
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
33
59
0
14 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video
  Parsing
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
34
181
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
  Source Separation
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
32
81
0
20 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
28
155
0
13 Jul 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
24
75
0
11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent
  Videos with Deep Learning
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
Sanchita Ghose
John J. Prevost
VGen
22
46
0
21 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
27
73
0
09 Jan 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
24
137
0
25 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
35
88
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
35
41
0
19 Oct 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
30
78
0
22 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
35
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Speech2Face: Learning the Face Behind a Voice
Speech2Face: Learning the Face Behind a Voice
Tae-Hyun Oh
Tali Dekel
Changil Kim
Inbar Mosseri
William T. Freeman
Michael Rubinstein
Wojciech Matusik
SSL
CVBM
33
163
0
23 May 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
33
206
0
16 Apr 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
27
130
0
11 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Noise-tolerant Audio-visual Online Person Verification using an
  Attention-based Neural Network Fusion
Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion
Suwon Shon
Tae-Hyun Oh
James R. Glass
19
50
0
27 Nov 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
51
745
0
10 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
529
0
09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
53
426
0
23 Mar 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
44
528
0
18 Dec 2017
Previous
12