Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

37 / 137 papers shown

Title
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks Humam Alwassel Silvio Giancola Guohao Li 33 123 0 23 Nov 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning Zehua Zhang David J. Crandall AI4TS SSL 28 23 0 23 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Yi Yang ViT 49 417 0 14 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment Pedro Morgado Yi Li Nuno Vasconcelos SSL 27 121 0 03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds Efthymios Tzinis Scott Wisdom A. Jansen Shawn Hershey Tal Remez D. Ellis J. Hershey 39 69 0 02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning L. Tao Xueting Wang T. Yamasaki VLM SSL 23 14 0 29 Oct 2020
Hard Negative Mixing for Contrastive Learning Yannis Kalantidis Mert Bulent Sariyildiz Noé Pion Philippe Weinzaepfel Diane Larlus SSL 53 628 0 02 Oct 2020
Sense and Learn: Self-Supervision for Omnipresent Sensors Aaqib Saeed Victor Ungureanu Beat Gfeller OOD SSL 22 39 0 28 Sep 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning Ying Cheng Ruize Wang Zhihao Pan Rui Feng Yuejie Zhang SSL 36 106 0 13 Aug 2020
Spatiotemporal Contrastive Video Representation Learning Rui Qian Tianjian Meng Boqing Gong Ming-Hsuan Yang Haoran Wang Serge J. Belongie Huayu Chen SSL AI4TS 41 492 0 09 Aug 2020
Learning Video Representations from Textual Web Supervision Jonathan C. Stroud Zhichao Lu Chen Sun Jia Deng Rahul Sukthankar Cordelia Schmid David A. Ross SSL 40 48 0 29 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing Yapeng Tian Dingzeyu Li Chenliang Xu 34 180 0 21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation Hang Zhou Xudong Xu Dahua Lin Xiaogang Wang Ziwei Liu DiffM 32 81 0 20 Jul 2020
Self-Supervised MultiModal Versatile Networks Jean-Baptiste Alayrac Adrià Recasens R. Schneider Relja Arandjelović Jason Ramapuram J. Fauw Lucas Smaira Sander Dieleman Andrew Zisserman SSL 40 372 0 29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 22 141 0 16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound Karren D. Yang Bryan C. Russell Justin Salamon SSL 24 75 0 11 Jun 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition? Abhinav Shukla Stavros Petridis M. Pantic SSL 32 28 0 04 May 2020
Conditioned Source Separation for Music Instrument Performances Olga Slizovskaia G. Haro E. Gómez 30 38 0 08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition Arsha Nagrani Chen Sun David A. Ross Rahul Sukthankar Cordelia Schmid Andrew Zisserman 33 54 0 30 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos Daniel Gordon Kiana Ehsani Dieter Fox Ali Farhadi SSL AI4TS 29 87 0 18 Mar 2020
Cross-modal Learning for Multi-modal Video Categorization Palash Goyal Saurabh Sahu Shalini Ghosh Chul Lee 15 8 0 07 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning Elad Amrani Rami Ben-Ari Daniel Rotman A. Bronstein 17 121 0 06 Mar 2020
Evolving Losses for Unsupervised Video Representation Learning A. Piergiovanni A. Angelova Michael S. Ryoo SSL 27 138 0 26 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani Joon Son Chung Samuel Albanie Andrew Zisserman SSL 21 88 0 20 Feb 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 197 207 0 23 Jan 2020
Deep Audio-Visual Learning: A Survey Hao Zhu Mandi Luo Rui Wang A. Zheng Ran He 31 156 0 14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network A. Tsiami Petros Koutras Petros Maragos 27 73 0 09 Jan 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering Humam Alwassel D. Mahajan Bruno Korbar Lorenzo Torresani Guohao Li Du Tran SSL 42 428 0 28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications Arda Senocak Tae-Hyun Oh Junsik Kim Ming-Hsuan Yang In So Kweon SSL 33 52 0 20 Nov 2019
Recursive Visual Sound Separation Using Minus-Plus Net Xudong Xu Bo Dai Dahua Lin 35 91 0 30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen EgoV 16 332 0 22 Aug 2019
Multi-task Self-Supervised Learning for Human Activity Detection Aaqib Saeed T. Ozcelebi J. Lukkien SSL 23 270 0 27 Jul 2019
Evolving Losses for Unlabeled Video Representation Learning A. Piergiovanni A. Angelova Michael S. Ryoo SSL 11 7 0 07 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard? Weiyao Wang Du Tran Matt Feiszli 31 442 0 29 May 2019
Self-supervised audio representation learning for mobile devices Marco Tagliasacchi Beat Gfeller Félix de Chaumont Quitry Dominik Roblek SSL AI4TS 6 46 0 24 May 2019
DynamoNet: Dynamic Action and Motion Network Ali Diba Vivek Sharma Luc Van Gool Rainer Stiefelhagen 30 110 0 25 Apr 2019
Revisiting Self-Supervised Visual Representation Learning Alexander Kolesnikov Xiaohua Zhai Lucas Beyer SSL 53 715 0 25 Jan 2019