ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.06355
  4. Cited By
Multiple Sound Sources Localization from Coarse to Fine
v1v2 (latest)

Multiple Sound Sources Localization from Coarse to Fine

European Conference on Computer Vision (ECCV), 2020
13 July 2020
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
ArXiv (abs)PDFHTMLGithub (83★)

Papers citing "Multiple Sound Sources Localization from Coarse to Fine"

50 / 114 papers shown
Segmenting Collision Sound Sources in Egocentric Videos
Segmenting Collision Sound Sources in Egocentric Videos
Kranti Parida
Omar Emara
Hazel Doughty
Dima Damen
VOS
335
0
0
17 Nov 2025
Complementary and Contrastive Learning for Audio-Visual Segmentation
Complementary and Contrastive Learning for Audio-Visual SegmentationIEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Pingping Zhang
Huchuan Lu
VOS
336
7
0
11 Oct 2025
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
Jinbae Seo
Hyeongjun Kwon
Kwonyoung Kim
Jiyoung Lee
Kwanghoon Sohn
VOS
323
1
0
26 Sep 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
244
4
0
26 Sep 2025
Learning from Silence and Noise for Visual Sound Source Localization
Learning from Silence and Noise for Visual Sound Source Localization
Xavier Juanola
G. Morais
Magdalena Fuentes
Gloria Haro
SSL
242
0
0
29 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
337
6
0
11 Aug 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
242
0
0
14 Jul 2025
Action Dubber: Timing Audible Actions via Inflectional Flow
Action Dubber: Timing Audible Actions via Inflectional Flow
Wenlong Wan
Weiying Zheng
Tianyi Xiang
Guiqing Li
Shengfeng He
239
0
0
16 Jun 2025
Learning to Highlight Audio by Watching Movies
Learning to Highlight Audio by Watching MoviesComputer Vision and Pattern Recognition (CVPR), 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
372
5
0
17 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
367
2
0
08 May 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOSVLM
329
1
0
30 Apr 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Improving Sound Source Localization with Joint Slot Attention on Image and AudioComputer Vision and Pattern Recognition (CVPR), 2025
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
467
2
0
21 Apr 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent AlignmentComputer Vision and Pattern Recognition (CVPR), 2025
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
288
4
0
17 Mar 2025
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual
  Video Parsing
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video ParsingAAAI Conference on Artificial Intelligence (AAAI), 2024
Pengcheng Zhao
Jinxing Zhou
Yang Zhao
Dan Guo
Yanxiang Chen
376
19
0
15 Dec 2024
Towards Open-Vocabulary Audio-Visual Event LocalizationComputer Vision and Pattern Recognition (CVPR), 2024
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
622
30
0
18 Nov 2024
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing
  Audio-Visual Question Answering
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Wei Wei
411
2
0
07 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Aligning Audio-Visual Joint Representations with an Agentic WorkflowNeural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Yibing Song
310
4
0
30 Oct 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
A Critical Assessment of Visual Sound Source Localization Models Including Negative AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xavier Juanola
Gloria Haro
Magdalena Fuentes
451
4
0
01 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
316
3
0
31 Aug 2024
Enhancing Sound Source Localization via False Negative Elimination
Enhancing Sound Source Localization via False Negative EliminationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zengjie Song
Jiangshe Zhang
Yuxi Wang
Junsong Fan
Zhaoxiang Zhang
382
4
0
29 Aug 2024
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual
  Segmentation
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
Zili Wang
Qi Yang
Linsu Shi
Jiazhong Yu
M. Tanveer
Fei Li
Shiming Xiang
VOS
286
5
0
03 Aug 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through
  Audio-Visual Alignment
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
477
10
0
18 Jul 2024
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang
Dejan Marković
Chenliang Xu
Alexander Richard
383
14
0
18 Jul 2024
Stepping Stones: A Progressive Training Strategy for Audio-Visual
  Semantic Segmentation
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
VOS
381
26
0
16 Jul 2024
Semantic Grouping Network for Audio Source Separation
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
357
5
0
04 Jul 2024
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
Khanh-Binh Nguyen
Chae Jung Park
VLMVOS
439
5
0
02 Jul 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual
  Transformers
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
214
14
0
07 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
433
1
0
04 Jun 2024
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise
  Pseudo Labeling
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
280
42
0
03 Jun 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
302
2
0
12 May 2024
SemiPL: A Semi-supervised Method for Event Sound Source Localization
SemiPL: A Semi-supervised Method for Event Sound Source Localization
Yue Li
Baiqiao Yin
Jinfu Liu
Jiajun Wen
Jiaying Lin
Mengyuan Liu
283
1
0
30 Apr 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large
  Multi-Modal Models
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLMCLIP
249
3
0
09 Apr 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
T-VSL: Text-Guided Visual Sound Source Localization in MixturesComputer Vision and Pattern Recognition (CVPR), 2024
Tanvir Mahmud
Yapeng Tian
Diana Marculescu
239
23
0
02 Apr 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior
  Source Knowledge
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
218
16
0
26 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Helen Treharne
Jiangkang Deng
Xiatian Zhu
VOS
235
11
0
21 Mar 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
390
29
0
17 Mar 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for
  Audio-Visual Source Localization
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
289
16
0
05 Mar 2024
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source
  Localization
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Yuhao Zhao
Hu Su
Wei Zou
276
4
0
05 Mar 2024
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous
  Driving
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving
Jiacheng Lin
Jiajun Chen
Kunyu Peng
Xuan He
Zhiyong Li
Rainer Stiefelhagen
Kailun Yang
337
26
0
28 Feb 2024
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
Tianxiang Chen
Zhentao Tan
Tao Gong
Qi Chu
Yue-bo Wu
Bin Liu
Le Lu
Jieping Ye
Nenghai Yu
VOS
348
10
0
04 Feb 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
356
8
0
08 Jan 2024
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for
  Audio-Visual Segmentation
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang
Xing Nie
Tong Li
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VOS
262
29
0
11 Dec 2023
Weakly-Supervised Audio-Visual Segmentation
Weakly-Supervised Audio-Visual SegmentationNeural Information Processing Systems (NeurIPS), 2023
Shentong Mo
Bhiksha Raj
VOS
354
24
0
25 Nov 2023
Can CLIP Help Sound Source Localization?
Can CLIP Help Sound Source Localization?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sooyoung Park
Arda Senocak
Joon Son Chung
216
16
0
07 Nov 2023
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and
  Audio
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and AudioNeural Information Processing Systems (NeurIPS), 2023
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
191
9
0
01 Nov 2023
Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design
Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot DesignInternational Conference on Learning Representations (ICLR), 2023
Heng Dong
Junyu Zhang
Chongjie Zhang
501
5
0
01 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
LAVSS: Location-Guided Audio-Visual Spatial Audio SeparationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuxin Ye
Wenming Yang
Yapeng Tian
283
12
0
31 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Multimodal Variational Auto-encoder based Audio-Visual SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
216
57
0
12 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual SegmentationIEEE transactions on multimedia (IEEE TMM), 2023
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
521
14
0
10 Oct 2023
QDFormer: Towards Robust Audiovisual Segmentation in Complex
  Environments with Quantization-based Semantic Decomposition
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic DecompositionComputer Vision and Pattern Recognition (CVPR), 2023
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiulian Peng
Rita Singh
Yan Lu
Bhiksha Raj
VOS
416
19
0
29 Sep 2023
123
Next
Page 1 of 3