Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2303.17056
Cited By
Audio-Visual Grouping Network for Sound Localization from Mixtures
Computer Vision and Pattern Recognition (CVPR), 2023
29 March 2023
Shentong Mo
Yapeng Tian
Re-assign community
ArXiv (abs)
PDF
HTML
Github (34★)
Papers citing
"Audio-Visual Grouping Network for Sound Localization from Mixtures"
26 / 26 papers shown
Title
Decoupled Audio-Visual Dataset Distillation
Wenyuan Li
Guang Li
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
78
0
0
22 Nov 2025
Segmenting Collision Sound Sources in Egocentric Videos
Kranti Parida
Omar Emara
Hazel Doughty
Dima Damen
VOS
186
0
0
17 Nov 2025
Cross-Modal Alignment via Variational Copula Modelling
Feng Wu
Tsai Hor Chan
Fuying Wang
Guosheng Yin
Lequan Yu
76
0
0
05 Nov 2025
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Tsai Hor Chan
Feng Wu
Yihang Chen
Guosheng Yin
Lequan Yu
121
0
0
23 Oct 2025
Complementary and Contrastive Learning for Audio-Visual Segmentation
IEEE transactions on multimedia (TMM), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Pingping Zhang
Huchuan Lu
VOS
174
2
0
11 Oct 2025
Learning from Silence and Noise for Visual Sound Source Localization
Xavier Juanola
G. Morais
Magdalena Fuentes
Gloria Haro
SSL
132
0
0
29 Aug 2025
Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha
Tianyu Li
G. Wang
Peng Wang
Yangyang Wu
Yang Yang
Heng Tao Shen
VOS
CML
126
0
0
28 Jul 2025
Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration
Siyi Xie
Hanxin Zhu
Tianyu He
X. Li
Zhibo Chen
VGen
175
2
0
18 Jun 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
217
0
0
08 May 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Computer Vision and Pattern Recognition (CVPR), 2025
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
338
2
0
21 Apr 2025
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Computer Vision and Pattern Recognition (CVPR), 2025
Mingfei Chen
I. D. Gebru
Ishwarya Ananthabhotla
Christian Richardt
Dejan Marković
Jake Sandakly
Steven Krenn
Todd Keebler
Eli Shlizerman
Alexander Richard
240
2
0
08 Apr 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Computer Vision and Pattern Recognition (CVPR), 2025
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
171
1
0
17 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
188
3
0
15 Mar 2025
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yuhang He
Sangyun Shin
Anoop Cherian
Niki Trigoni
Andrew Markham
388
0
0
31 Dec 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xavier Juanola
Gloria Haro
Magdalena Fuentes
296
4
0
01 Oct 2024
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
Yidi Li
Yihan Li
Yixin Guo
Bin Ren
Zhenhuan Xu
Hao Guo
Hong Liu
Andrii Zadaianchuk
388
0
0
26 Aug 2024
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
228
2
0
12 May 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
193
16
0
05 Mar 2024
More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal Fusion Based on Signal Theory
Peiwen Sun
Yifan Zhang
Zishan Liu
Donghao Chen
Honggang Zhang
213
0
0
12 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Computer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Pedro Morgado
226
27
0
02 Dec 2023
Audio-Visual Instance Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
289
11
0
28 Oct 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
144
31
0
11 Sep 2023
Audio-Visual Class-Incremental Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
154
33
0
21 Aug 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
International Conference on Machine Learning (ICML), 2023
Shentong Mo
Pedro Morgado
166
25
0
30 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
100
17
0
22 May 2023
Audio-Visual Segmentation with Semantics
International Journal of Computer Vision (IJCV), 2023
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
144
71
0
30 Jan 2023
1