Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.05600
Cited By
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
12 April 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning"
8 / 8 papers shown
Title
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video
Kevin Cai
Chonghua Liu
David M. Chan
VGen
10
3
0
10 Jan 2024
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
58
14
0
14 Mar 2023
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu
Chen An Li
Tzu-Han Lin
Tsung-Yuan Hsu
Hung-yi Lee
12
5
0
26 Sep 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
22
8
0
19 May 2022
Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang
Pauline Luc
Adrià Recasens
Jean-Baptiste Alayrac
Aaron van den Oord
SSL
70
38
0
26 Apr 2021
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
171
288
0
25 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
181
204
0
23 Jan 2020
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
206
1,954
0
14 Jun 2018
1