Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1911.12667
Cited By
v1
v2
v3 (latest)
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Neural Information Processing Systems (NeurIPS), 2019
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Supervised Learning by Cross-Modal Audio-Video Clustering"
30 / 280 papers shown
Title
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Xavier Favory
Konstantinos Drossos
Maria Sandsten
Xavier Serra
199
16
0
27 Oct 2020
Self-supervised Co-training for Video Representation Learning
Neural Information Processing Systems (NeurIPS), 2020
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
459
361
0
19 Oct 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
173
149
0
12 Oct 2020
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
250
260
0
06 Oct 2020
Hard Negative Mixing for Contrastive Learning
Neural Information Processing Systems (NeurIPS), 2020
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
443
710
0
02 Oct 2020
Understanding Self-supervised Learning with Dual Deep Networks
Yuandong Tian
Lantao Yu
Xinlei Chen
Surya Ganguli
SSL
446
86
0
01 Oct 2020
SEMI: Self-supervised Exploration via Multisensory Incongruity
IEEE International Conference on Robotics and Automation (ICRA), 2020
Jianren Wang
Ziwen Zhuang
Hang Zhao
SSL
139
1
0
26 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
160
9
0
31 Aug 2020
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Jiangliu Wang
Jianbo Jiao
Linchao Bao
Shengfeng He
Wei Liu
Yunhui Liu
SSL
AI4TS
192
58
0
31 Aug 2020
Delving into Inter-Image Invariance for Unsupervised Visual Representations
International Journal of Computer Vision (IJCV), 2020
Jiahao Xie
Xiaohang Zhan
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
SSL
VLM
179
61
0
26 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction
Jiangliu Wang
Jianbo Jiao
Yunhui Liu
SSL
AI4TS
203
251
0
13 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
221
117
0
13 Aug 2020
What Should Not Be Contrastive in Contrastive Learning
International Conference on Learning Representations (ICLR), 2020
Tete Xiao
Xiaolong Wang
Alexei A. Efros
Trevor Darrell
SSL
DRL
277
330
0
13 Aug 2020
Spatiotemporal Contrastive Video Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2020
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
Jian Shu
Serge J. Belongie
Huayu Chen
SSL
AI4TS
373
543
0
09 Aug 2020
Memory-augmented Dense Predictive Coding for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
293
254
0
03 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Gaowen Liu
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
218
50
0
29 Jul 2020
Leveraging Category Information for Single-Frame Visual Sound Source Separation
Xiangjie Sui
Esa Rahtu
129
9
0
15 Jul 2020
Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision
Abhinav Shukla
Stavros Petridis
Maja Pantic
SSL
129
16
0
08 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
369
395
0
29 Jun 2020
Video Representation Learning with Visual Tempo Consistency
Ceyuan Yang
Yinghao Xu
Bo Dai
Bolei Zhou
146
94
0
28 Jun 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision
Neural Information Processing Systems (NeurIPS), 2020
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
253
161
0
24 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
202
142
0
16 Jun 2020
Video Understanding as Machine Translation
Bruno Korbar
Fabio Petroni
Rohit Girdhar
Lorenzo Torresani
SSL
195
29
0
12 Jun 2020
Are we done with ImageNet?
Lucas Beyer
Olivier J. Hénaff
Alexander Kolesnikov
Xiaohua Zhai
Aaron van den Oord
VLM
311
454
0
12 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Xiangjie Sui
Esa Rahtu
195
25
0
04 Jun 2020
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Haytham M. Fayek
Anurag Kumar
176
37
0
29 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
IEEE Transactions on Affective Computing (IEEE TAC), 2020
Abhinav Shukla
Stavros Petridis
Maja Pantic
SSL
377
33
0
04 May 2020
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Computer Vision and Pattern Recognition (CVPR), 2020
Pedro Morgado
Nuno Vasconcelos
Ishan Misra
SSL
274
293
0
27 Apr 2020
On Compositions of Transformations in Contrastive Self-Supervised Learning
IEEE International Conference on Computer Vision (ICCV), 2020
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
184
53
0
09 Mar 2020
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Longlong Jing
Yingli Tian
SSL
388
1,881
0
16 Feb 2019
Previous
1
2
3
4
5
6