v1v2v3 (latest)

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Neural Information Processing Systems (NeurIPS), 2019

28 November 2019

Papers citing "Self-Supervised Learning by Cross-Modal Audio-Video Clustering"

30 / 280 papers shown

Title
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020 Xavier Favory Konstantinos Drossos Maria Sandsten Xavier Serra 199 16 0 27 Oct 2020
Self-supervised Co-training for Video Representation LearningNeural Information Processing Systems (NeurIPS), 2020 Tengda Han Weidi Xie Andrew Zisserman SSL 459 361 0 19 Oct 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching Di Hu Rui Qian Minyue Jiang Xiao Tan Shilei Wen Errui Ding Weiyao Lin Dejing Dou 173 149 0 12 Oct 2020
Support-set bottlenecks for video-text representation learning Mandela Patrick Po-Yao (Bernie) Huang Yuki M. Asano Florian Metze Alexander G. Hauptmann João Henriques Andrea Vedaldi 250 260 0 06 Oct 2020
Hard Negative Mixing for Contrastive LearningNeural Information Processing Systems (NeurIPS), 2020 Yannis Kalantidis Mert Bulent Sariyildiz Noé Pion Philippe Weinzaepfel Diane Larlus SSL 443 710 0 02 Oct 2020
Understanding Self-supervised Learning with Dual Deep Networks Yuandong Tian Lantao Yu Xinlei Chen Surya Ganguli SSL 446 86 0 01 Oct 2020
SEMI: Self-supervised Exploration via Multisensory IncongruityIEEE International Conference on Robotics and Automation (ICRA), 2020 Jianren Wang Ziwen Zhuang Hang Zhao SSL 139 1 0 26 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations Shuang Ma Zhaoyang Zeng Daniel J. McDuff Yale Song VLM SSL 160 9 0 31 Aug 2020
Self-supervised Video Representation Learning by Uncovering Spatio-temporal StatisticsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020 Jiangliu Wang Jianbo Jiao Linchao Bao Shengfeng He Wei Liu Yunhui Liu SSL AI4TS 192 58 0 31 Aug 2020
Delving into Inter-Image Invariance for Unsupervised Visual RepresentationsInternational Journal of Computer Vision (IJCV), 2020 Jiahao Xie Xiaohang Zhan Ziwei Liu Yew-Soon Ong Chen Change Loy SSL VLM 179 61 0 26 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction Jiangliu Wang Jianbo Jiao Yunhui Liu SSL AI4TS 203 251 0 13 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning Ying Cheng Ruize Wang Zhihao Pan Rui Feng Yuejie Zhang SSL 221 117 0 13 Aug 2020
What Should Not Be Contrastive in Contrastive LearningInternational Conference on Learning Representations (ICLR), 2020 Tete Xiao Xiaolong Wang Alexei A. Efros Trevor Darrell SSL DRL 277 330 0 13 Aug 2020
Spatiotemporal Contrastive Video Representation LearningComputer Vision and Pattern Recognition (CVPR), 2020 Rui Qian Tianjian Meng Boqing Gong Ming-Hsuan Yang Jian Shu Serge J. Belongie Huayu Chen SSL AI4TS 373 543 0 09 Aug 2020
Memory-augmented Dense Predictive Coding for Video Representation Learning Tengda Han Weidi Xie Andrew Zisserman SSL 293 254 0 03 Aug 2020
Learning Video Representations from Textual Web Supervision Jonathan C. Stroud Zhichao Lu Chen Sun Gaowen Liu Rahul Sukthankar Cordelia Schmid David A. Ross SSL 218 50 0 29 Jul 2020
Leveraging Category Information for Single-Frame Visual Sound Source Separation Xiangjie Sui Esa Rahtu 129 9 0 15 Jul 2020
Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision Abhinav Shukla Stavros Petridis Maja Pantic SSL 129 16 0 08 Jul 2020
Self-Supervised MultiModal Versatile Networks Jean-Baptiste Alayrac Adrià Recasens R. Schneider Relja Arandjelović Jason Ramapuram J. Fauw Lucas Smaira Sander Dieleman Andrew Zisserman SSL 369 395 0 29 Jun 2020
Video Representation Learning with Visual Tempo Consistency Ceyuan Yang Yinghao Xu Bo Dai Bolei Zhou 146 94 0 28 Jun 2020
Labelling unlabelled videos from scratch with multi-modal self-supervisionNeural Information Processing Systems (NeurIPS), 2020 Yuki M. Asano Mandela Patrick Christian Rupprecht Andrea Vedaldi SSL 253 161 0 24 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 202 142 0 16 Jun 2020
Video Understanding as Machine Translation Bruno Korbar Fabio Petroni Rohit Girdhar Lorenzo Torresani SSL 195 29 0 12 Jun 2020
Are we done with ImageNet? Lucas Beyer Olivier J. Hénaff Alexander Kolesnikov Xiaohua Zhai Aaron van den Oord VLM 311 454 0 12 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network Xiangjie Sui Esa Rahtu 195 25 0 04 Jun 2020
Large Scale Audiovisual Learning of Sounds with Weakly Labeled DataInternational Joint Conference on Artificial Intelligence (IJCAI), 2020 Haytham M. Fayek Anurag Kumar 176 37 0 29 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?IEEE Transactions on Affective Computing (IEEE TAC), 2020 Abhinav Shukla Stavros Petridis Maja Pantic SSL 377 33 0 04 May 2020
Audio-Visual Instance Discrimination with Cross-Modal AgreementComputer Vision and Pattern Recognition (CVPR), 2020 Pedro Morgado Nuno Vasconcelos Ishan Misra SSL 274 293 0 27 Apr 2020
On Compositions of Transformations in Contrastive Self-Supervised LearningIEEE International Conference on Computer Vision (ICCV), 2020 Mandela Patrick Yuki M. Asano Polina Kuznetsova Ruth C. Fong João F. Henriques Geoffrey Zweig Andrea Vedaldi 184 53 0 09 Mar 2020
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey Longlong Jing Yingli Tian SSL 388 1,881 0 16 Feb 2019