Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.01143
Cited By
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
2 November 2020
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds"
13 / 13 papers shown
Title
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
37
28
0
02 Jan 2025
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
19
18
0
19 Sep 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Egocentric Audio-Visual Noise Suppression
Roshan S. Sharma
Weipeng He
Ju Lin
Egor Lakomkin
Yang Liu
Kaustubh Kalgaonkar
EgoV
12
1
0
07 Nov 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
27
29
0
20 Jul 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
22
106
0
02 Mar 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder
Kristen Grauman
11
21
0
02 Feb 2022
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
13
27
0
21 Nov 2021
Cross-Modality Fusion Transformer for Multispectral Object Detection
Q. Fang
D. Han
Zhaokui Wang
ViT
8
139
0
30 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
13
267
0
21 Oct 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
23
536
0
30 Jun 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Yuma Koizumi
Shigeki Karita
Scott Wisdom
Hakan Erdogan
J. Hershey
Llion Jones
M. Bacchiani
19
41
0
30 Jun 2021
Source separation with weakly labelled data: An approach to computational auditory scene analysis
Qiuqiang Kong
Yuxuan Wang
Xuchen Song
Yin Cao
Wenwu Wang
Mark D. Plumbley
19
47
0
06 Feb 2020
1