ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.10724
  4. Cited By
Sound Source Localization is All about Cross-Modal Alignment

Sound Source Localization is All about Cross-Modal Alignment

19 September 2023
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
ArXivPDFHTML

Papers citing "Sound Source Localization is All about Cross-Modal Alignment"

19 / 19 papers shown
Title
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
35
0
0
08 May 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
32
0
0
04 Mar 2025
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal
  Latent Alignment
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
63
0
0
09 Dec 2024
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text
  Understanding
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Jongbhin Woo
H. Ryu
Youngjoon Jang
Jae-Won Cho
Joon Son Chung
16
0
0
17 Oct 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
24
2
0
07 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and
  Time
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
27
9
0
01 Jul 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
32
8
0
20 May 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud
Yapeng Tian
Diana Marculescu
34
7
0
02 Apr 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior
  Source Knowledge
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
22
4
0
26 Mar 2024
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous
  Driving
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving
Jiacheng Lin
Jiajun Chen
Kunyu Peng
Xuan He
Zhiyong Li
Rainer Stiefelhagen
Kailun Yang
35
6
0
28 Feb 2024
Can CLIP Help Sound Source Localization?
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
8
6
0
07 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
18
2
0
28 Oct 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual
  Segmentation
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
6
5
0
06 Apr 2023
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
67
64
0
30 Aug 2022
With a Little Help from My Friends: Nearest-Neighbor Contrastive
  Learning of Visual Representations
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
SSL
175
382
0
29 Apr 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
174
196
0
08 Jan 2021
Self-supervised Co-training for Video Representation Learning
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
193
371
0
19 Oct 2020
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
229
3,029
0
09 Mar 2020
1