ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.03849
  4. Cited By
Learning to Localize Sound Source in Visual Scenes

Learning to Localize Sound Source in Visual Scenes

10 March 2018
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
    SSL
ArXivPDFHTML

Papers citing "Learning to Localize Sound Source in Visual Scenes"

50 / 90 papers shown
Title
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
Yuhang He
Sangyun Shin
Anoop Cherian
Niki Trigoni
Andrew Markham
93
0
0
31 Dec 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
36
2
0
01 Oct 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through
  Audio-Visual Alignment
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
43
3
0
18 Jul 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
56
0
0
04 Jun 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
52
10
0
17 Mar 2024
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous
  Driving
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving
Jiacheng Lin
Jiajun Chen
Kunyu Peng
Xuan He
Zhiyong Li
Rainer Stiefelhagen
Kailun Yang
61
6
0
28 Feb 2024
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
36
2
0
28 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
33
9
0
25 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
31
5
0
13 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
33
3
0
10 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
RealImpact: A Dataset of Impact Sound Fields for Real Objects
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
42
9
0
16 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
37
1
0
12 May 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
31
16
0
28 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
44
13
0
04 Mar 2023
LipLearner: Customizable Silent Speech Interactions on Mobile Devices
LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Zixiong Su
Shitao Fang
Jun Rekimoto
18
26
0
12 Feb 2023
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
56
0
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
34
27
0
07 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
35
51
0
28 Nov 2022
MarginNCE: Robust Sound Localization with a Negative Margin
MarginNCE: Robust Sound Localization with a Negative Margin
Sooyoung Park
Arda Senocak
Joon Son Chung
SSL
27
13
0
03 Nov 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using
  Permutation-Free Loss Function
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
19
4
0
26 Oct 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
85
64
0
30 Aug 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
45
5
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jingyang Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
33
109
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
40
26
0
20 Jun 2022
Weakly-Supervised Action Detection Guided by Audio Narration
Weakly-Supervised Action Detection Guided by Audio Narration
Keren Ye
Adriana Kovashka
40
0
0
12 May 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
29
19
0
26 Apr 2022
Learning Neural Acoustic Fields
Learning Neural Acoustic Fields
Andrew F. Luo
Yilun Du
Michael J. Tarr
J. Tenenbaum
Antonio Torralba
Chuang Gan
AI4CE
20
79
0
04 Apr 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
19
2
0
30 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
39
136
0
26 Mar 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
35
78
0
17 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
18
25
0
13 Feb 2022
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
Jannik Zürn
Wolfram Burgard
SSL
36
8
0
30 Jan 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
36
41
0
06 Jan 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
24
41
0
22 Dec 2021
Soundify: Matching Sound Effects to Video
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
30
16
0
17 Dec 2021
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Zhijian Yang
Xiaoran Fan
Volkan Isler
H. Park
3DH
26
6
0
01 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
38
20
0
15 Nov 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
36
0
0
13 Oct 2021
Visual Scene Graphs for Audio Source Separation
Visual Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Jonathan Le Roux
Narendra Ahuja
A. Cherian
26
36
0
24 Sep 2021
Audio-Visual Collaborative Representation Learning for Dynamic Saliency
  Prediction
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
Hailong Ning
Bin Zhao
Zhanxuan Hu
Lang He
Ercheng Pei
32
10
0
17 Sep 2021
The Right to Talk: An Audio-Visual Transformer Approach
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
63
36
0
06 Aug 2021
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
Xinyuan Qian
Maulik C. Madhavi
Zexu Pan
Jiadong Wang
Haizhou Li
27
44
0
13 May 2021
Visually Guided Sound Source Separation and Localization using
  Self-Supervised Motion Representations
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Lingyu Zhu
Esa Rahtu
29
25
0
17 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
55
0
13 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound
  Separation
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
21
88
0
05 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
12
Next