Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.09476
Cited By
Music Gesture for Visual Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2020
20 April 2020
Chuang Gan
Deng Huang
Hang Zhao
J. Tenenbaum
Antonio Torralba
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Music Gesture for Visual Sound Separation"
50 / 131 papers shown
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
Jiayu Zhang
Qilang Ye
Shuo Ye
Xun Lin
Zihan Song
Zitong Yu
173
0
0
21 Oct 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
244
4
0
26 Sep 2025
Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration
Xingmei Wang
Xiaoyu Hu
Chengkai Huang
Ziyan Zeng
Guohao Nie
Quan Z. Sheng
L. Yao
3DPC
164
1
0
19 Sep 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
424
34
0
02 Jan 2025
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
ACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
540
3
0
24 Nov 2024
Continual Audio-Visual Sound Separation
Neural Information Processing Systems (NeurIPS), 2024
Weiguo Pian
Yiyang Nan
Shijian Deng
Shentong Mo
Yunhui Guo
Yapeng Tian
VLM
CLL
414
6
0
05 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Neural Information Processing Systems (NeurIPS), 2024
Shentong Mo
Yibing Song
308
4
0
30 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
316
3
0
31 Aug 2024
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization
ACM Multimedia (MM), 2024
Xiang He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
313
17
0
04 Aug 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
270
21
0
30 Jul 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
477
10
0
18 Jul 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
291
9
0
18 Jul 2024
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
357
5
0
04 Jul 2024
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
Zongyue Qin
Yunsheng Bai
Atefeh Sohrabizadeh
Zijian Ding
Ziniu Hu
Yizhou Sun
Jason Cong
460
14
0
13 Jun 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
214
14
0
07 Jun 2024
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
447
1
0
27 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
247
34
0
08 Mar 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Computer Vision and Pattern Recognition (CVPR), 2024
Yining Hong
Zishuo Zheng
Peihao Chen
Yian Wang
Junyan Li
Chuang Gan
336
58
0
16 Jan 2024
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yukun Zuo
Hantao Yao
Liansheng Zhuang
Changsheng Xu
400
6
0
11 Jan 2024
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Computer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Pedro Morgado
306
36
0
02 Dec 2023
Weakly-Supervised Audio-Visual Segmentation
Neural Information Processing Systems (NeurIPS), 2023
Shentong Mo
Bhiksha Raj
VOS
354
24
0
25 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yating Xu
Conghui Hu
Gim Hee Lee
223
8
0
14 Nov 2023
Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation
Knowledge-Based Systems (KBS), 2023
Zhaojian Li
Jiangwei Zhong
Yuan Yuan
316
9
0
13 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuxin Ye
Wenming Yang
Yapeng Tian
283
12
0
31 Oct 2023
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Yiyang Su
Ali Vosoughi
Shijian Deng
Yapeng Tian
Chenliang Xu
284
5
0
18 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Xiulong Liu
Zhikang Dong
Peng Zhang
268
38
0
10 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment
IEEE International Conference on Computer Vision (ICCV), 2023
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
317
39
0
19 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
251
34
0
11 Sep 2023
AdVerb: Visually Guided Audio Dereverberation
IEEE International Conference on Computer Vision (ICCV), 2023
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
285
20
0
23 Aug 2023
Audio-Visual Class-Incremental Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
275
44
0
21 Aug 2023
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
ACM Multimedia (ACM MM), 2023
Guangyao Li
Wenxuan Hou
Di Hu
325
50
0
10 Aug 2023
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Asian Conference on Computer Vision (ACCV), 2023
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
223
4
0
31 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
340
17
0
05 Jul 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Zengjie Song
Zhaoxiang Zhang
222
5
0
19 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
IEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
380
15
0
01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
International Conference on Machine Learning (ICML), 2023
Shentong Mo
Pedro Morgado
255
27
0
30 May 2023
ProgSG: Cross-Modality Representation Learning for Programs in Electronic Design Automation
Yunsheng Bai
Atefeh Sohrabizadeh
Zongyue Qin
Ziniu Hu
Luke Huan
Jason Cong
454
1
0
18 May 2023
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Fa-Ting Hong
Li Shen
Dan Xu
3DH
CVBM
306
35
0
10 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Computer Vision and Image Understanding (CVIU), 2023
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
642
15
0
05 May 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
450
109
0
31 Mar 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Computer Vision and Pattern Recognition (CVPR), 2023
Shentong Mo
Yapeng Tian
241
69
0
29 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGen
AI4CE
374
29
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
348
50
0
23 Mar 2023
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
335
42
0
07 Dec 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Neural Information Processing Systems (NeurIPS), 2022
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
231
14
0
29 Oct 2022
Pay Self-Attention to Audio-Visual Navigation
British Machine Vision Conference (BMVC), 2022
Yinfeng Yu
Lele Cao
Gang Hua
Xiaohong Liu
Liejun Wang
382
18
0
04 Oct 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Neural Information Processing Systems (NeurIPS), 2022
Shentong Mo
Pedro Morgado
295
85
0
30 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
340
76
0
20 Aug 2022
ConceptBeam: Concept Driven Target Speech Extraction
ACM Multimedia (ACM MM), 2022
Yasunori Ohishi
Marc Delcroix
Tsubasa Ochiai
S. Araki
Daiki Takeuchi
Daisuke Niizumi
Akisato Kimura
Noboru Harada
K. Kashino
261
24
0
25 Jul 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
European Conference on Computer Vision (ECCV), 2022
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
348
36
0
20 Jul 2022
1
2
3
Next
Page 1 of 3