Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.10724
Cited By
Sound Source Localization is All about Cross-Modal Alignment
19 September 2023
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sound Source Localization is All about Cross-Modal Alignment"
19 / 19 papers shown
Title
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
35
0
0
08 May 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
32
0
0
04 Mar 2025
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
63
0
0
09 Dec 2024
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Jongbhin Woo
H. Ryu
Youngjoon Jang
Jae-Won Cho
Joon Son Chung
16
0
0
17 Oct 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
24
2
0
07 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
27
9
0
01 Jul 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
32
8
0
20 May 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud
Yapeng Tian
Diana Marculescu
34
7
0
02 Apr 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
22
4
0
26 Mar 2024
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving
Jiacheng Lin
Jiajun Chen
Kunyu Peng
Xuan He
Zhiyong Li
Rainer Stiefelhagen
Kailun Yang
35
6
0
28 Feb 2024
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
8
6
0
07 Nov 2023
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
18
2
0
28 Oct 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
6
5
0
06 Apr 2023
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
67
64
0
30 Aug 2022
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
SSL
175
382
0
29 Apr 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
174
196
0
08 Jan 2021
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
193
371
0
19 Oct 2020
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
229
3,029
0
09 Mar 2020
1