ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.06406
  4. Cited By
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

AAAI Conference on Artificial Intelligence (AAAI), 2022
13 February 2022
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
ArXiv (abs)PDFHTMLGithub (29★)

Papers citing "Visual Sound Localization in the Wild by Cross-Modal Interference Erasing"

21 / 21 papers shown
Title
Learning from Silence and Noise for Visual Sound Source Localization
Learning from Silence and Noise for Visual Sound Source Localization
Xavier Juanola
G. Morais
Magdalena Fuentes
Gloria Haro
SSL
144
0
0
29 Aug 2025
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion
Ziyi Yang
Fanqi Wan
Longguang Zhong
Canbin Huang
Guosheng Liang
Xiaojun Quan
MoMe
264
9
0
06 Mar 2025
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationIEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Yifan Wang
Pingping Zhang
Lijun Wang
Huchuan Lu
MambaVOS
105
13
0
14 Jan 2025
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing
  Audio-Visual Question Answering
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Wei Wei
280
1
0
07 Nov 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
A Critical Assessment of Visual Sound Source Localization Models Including Negative AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xavier Juanola
Gloria Haro
Magdalena Fuentes
344
4
0
01 Oct 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for
  Audio-Visual Source Localization
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
213
16
0
05 Mar 2024
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source
  Localization
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Yuhao Zhao
Hu Su
Wei Zou
215
4
0
05 Mar 2024
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal
  Localized Alignment
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized AlignmentInternational Conference on Machine Learning (ICML), 2023
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
275
1
0
12 Oct 2023
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation
  Knowledge
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation KnowledgeIEEE transactions on multimedia (IEEE TMM), 2023
Chen Liu
Peike Li
Hu Zhang
Lincheng Li
Zi Huang
Dadong Wang
Xin Yu
VOS
164
43
0
20 Aug 2023
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Audio-Visual Segmentation by Exploring Cross-Modal Mutual SemanticsACM Multimedia (ACM MM), 2023
Chen Liu
Peike Li
Xingqun Qi
Hu Zhang
Lincheng Li
Dadong Wang
Xin Yu
VOS
215
43
0
31 Jul 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive RepresentationsNeural Information Processing Systems (NeurIPS), 2023
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
173
40
0
22 May 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in ConversationsComputer Vision and Pattern Recognition (CVPR), 2023
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
219
23
0
28 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and BaselineComputer Vision and Pattern Recognition (CVPR), 2023
Tiantian Geng
Teng Wang
Yanfu Zhang
Runmin Cong
Feng Zheng
181
59
0
22 Mar 2023
Audio-Driven Co-Speech Gesture Video Generation
Audio-Driven Co-Speech Gesture Video GenerationNeural Information Processing Systems (NeurIPS), 2022
Xian Liu
Qianyi Wu
Hang Zhou
Yuanqi Du
Wayne Wu
Dahua Lin
Ziwei Liu
SLRVGen
261
67
0
05 Dec 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source LocalizationNeural Information Processing Systems (NeurIPS), 2022
Shentong Mo
Pedro Morgado
237
79
0
30 Aug 2022
Static and Dynamic Concepts for Self-supervised Video Representation
  Learning
Static and Dynamic Concepts for Self-supervised Video Representation LearningEuropean Conference on Computer Vision (ECCV), 2022
Rui Qian
Shuangrui Ding
Xian Liu
Dahua Lin
SSL
156
26
0
26 Jul 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual ScenariosComputer Vision and Pattern Recognition (CVPR), 2022
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
251
200
0
26 Mar 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture
  Generation
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture GenerationComputer Vision and Pattern Recognition (CVPR), 2022
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
211
133
0
24 Mar 2022
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
Semantic-Aware Implicit Neural Audio-Driven Video Portrait GenerationEuropean Conference on Computer Vision (ECCV), 2022
Xian Liu
Yinghao Xu
Qianyi Wu
Hang Zhou
Wayne Wu
Bolei Zhou
VGenDiffM3DH
190
161
0
19 Jan 2022
Audio-Visual Collaborative Representation Learning for Dynamic Saliency
  Prediction
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
Hailong Ning
Bin Zhao
Zhanxuan Hu
Lang He
Ercheng Pei
236
12
0
17 Sep 2021
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio InpaintingIEEE International Conference on Computer Vision (ICCV), 2019
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
286
91
0
24 Oct 2019
1