Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 137 papers shown

Title
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Joey Tianyi Zhou Gedas Bertasius 54 39 0 06 Apr 2022
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices V. S. Kadandale Juan F. Montesinos G. Haro 27 23 0 05 Apr 2022
The Sound of Bounding-Boxes Takashi Oya Shohei Iwase Shigeo Morishima 19 2 0 30 Mar 2022
Localizing Visual Sounds the Easy Way Shentong Mo Pedro Morgado 26 78 0 17 Mar 2022
Object discovery and representation networks Olivier J. Hénaff Skanda Koppula Evan Shelhamer Daniel Zoran Andrew Jaegle Andrew Zisserman João Carreira Relja Arandjelović 44 87 0 16 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 40 106 0 02 Mar 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition Zitian Zhang Jie Zhang Jian-Shu Zhang Ming Wu Xin Fang Lirong Dai SSL 41 10 0 15 Feb 2022
Visual Acoustic Matching Changan Chen Ruohan Gao P. Calamia Kristen Grauman 21 56 0 14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 18 25 0 13 Feb 2022
Real-time Emergency Vehicle Event Detection Using Audio Data Zubayer Islam Mohamed Abdel-Aty 16 5 0 03 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection A. Haliassos Rodrigo Mira Stavros Petridis M. Pantic CVBM 40 126 0 18 Jan 2022
SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical Segmentation on Less Labeled Data Minh-Khoi Tran Loi Ly Binh-Son Hua Ngan Le 3DPC MedIm 30 17 0 15 Jan 2022
Robust Contrastive Learning against Noisy Views Ching-Yao Chuang R. Devon Hjelm Xin Wang Vibhav Vineet Neel Joshi Antonio Torralba Stefanie Jegelka Ya-heng Song NoLa 16 68 0 12 Jan 2022
Progressive Video Summarization via Multimodal Self-supervised Learning Haopeng Li Qiuhong Ke Mingming Gong Tom Drummond AI4TS 39 18 0 07 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Bowen Shi Wei-Ning Hsu Kushal Lakhotia Abdel-rahman Mohamed SSL 52 306 0 05 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks A. Vasudevan Dengxin Dai Luc Van Gool SSL 38 6 0 04 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 38 23 0 09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning Rui Qian Yeqing Li Liangzhe Yuan Boqing Gong Ting Liu Matthew A. Brown Serge Belongie Ming-Hsuan Yang Hartwig Adam Huayu Chen AI4TS 61 6 0 08 Dec 2021
Time-Equivariant Contrastive Video Representation Learning Simon Jenni Hailin Jin SSL AI4TS 143 60 0 07 Dec 2021
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup Siyuan Li Zicheng Liu Zedong Wang Di Wu Zihan Liu Stan Z. Li 37 26 0 30 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 36 20 0 15 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 29 8 0 26 Oct 2021
Constrained Mean Shift for Representation Learning Ajinkya Tejankar Soroush Abbasi Koohpayegani Hamed Pirsiavash SSL 45 0 0 19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 34 0 0 13 Oct 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning Zixu Zhao Yueming Jin Pheng-Ann Heng SSL 42 21 0 28 Sep 2021
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning Wei Feng Yuanjiang Wang Lihua Ma Ye Yuan Chi Zhang SSL 21 13 0 24 Aug 2021
Learning to Cut by Watching Movies Alejandro Pardo Fabian Caba Heilbron Juan Carlos León Alcázar Ali K. Thabet Guohao Li VGen 58 20 0 09 Aug 2021
Towards Long-Form Video Understanding Chaoxia Wu Philipp Krahenbuhl VLM ViT 54 166 0 21 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision Pingchuan Ma Rodrigo Mira Stavros Petridis Björn W. Schuller M. Pantic SSL 24 53 0 16 Jun 2021
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition Yikang Shen Chun-Fu Chen Quanfu Fan Ximeng Sun Kate Saenko A. Oliva Rogerio Feris 36 47 0 11 May 2021
Contrastive Attraction and Contrastive Repulsion for Representation Learning Huangjie Zheng Xu Chen Jiangchao Yao Hongxia Yang Chunyuan Li Ya Zhang Hao Zhang Ivor Tsang Jingren Zhou Mingyuan Zhou SSL 42 12 0 08 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning Christoph Feichtenhofer Haoqi Fan Bo Xiong Ross B. Girshick Kaiming He SSL AI4TS 39 257 0 29 Apr 2021
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations Lingyu Zhu Esa Rahtu 26 25 0 17 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios Xudong Xu Hang Zhou Ziwei Liu Bo Dai Xiaogang Wang Dahua Lin DiffM 13 55 0 13 Apr 2021
Can audio-visual integration strengthen robustness under multimodal attacks? Yapeng Tian Chenliang Xu AAML 36 37 0 05 Apr 2021
Cross-Modal learning for Audio-Visual Video Parsing Jatin Lamba Abhishek Jayaprakash Akula Rishabh Dabral P. Jyothi Ganesh Ramakrishnan 13 7 0 03 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video Bo Xiong Haoqi Fan Kristen Grauman Christoph Feichtenhofer SSL 22 49 0 01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning Yan-Bo Lin Hung-Yu Tseng Hsin-Ying Lee Yen-Yu Lin Ming-Hsuan Yang SSL 27 34 0 01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning Adrià Recasens Pauline Luc Jean-Baptiste Alayrac Luyu Wang Ross Hemsley ... Florent Altché M. Valko Jean-Bastien Grill Aaron van den Oord Andrew Zisserman SSL AI4TS 33 127 0 30 Mar 2021
Robust Audio-Visual Instance Discrimination Pedro Morgado Ishan Misra Nuno Vasconcelos SSL 22 110 0 29 Mar 2021
Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting A. Bhunia Pinaki Nath Chowdhury Yongxin Yang Timothy M. Hospedales Tao Xiang Yi-Zhe Song SSL 20 59 0 25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 29 33 0 18 Mar 2021
Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations Xiang Gao Wei Hu Guo-Jun Qi 32 6 0 01 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal Generation Ye Zhu Yu Wu Hugo Latapie Yi Yang Yan Yan SSL 44 20 0 05 Feb 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning Sangho Lee Jiwan Chung Youngjae Yu Gunhee Kim Thomas Breuel Gal Chechik Yale Song 71 45 0 26 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts Kunpeng Li Zizhao Zhang Guanhang Wu Xuehan Xiong Chen-Yu Lee Zhichao Lu Y. Fu Tomas Pfister 34 5 0 11 Jan 2021
Transformers in Vision: A Survey Salman Khan Muzammal Naseer Munawar Hayat Syed Waqas Zamir Fahad Shahbaz Khan M. Shah ViT 227 2,434 0 04 Jan 2021
Semantic Audio-Visual Navigation Changan Chen Ziad Al-Halah Kristen Grauman 50 104 0 21 Dec 2020
InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees Nghi D. Q. Bui Yijun Yu Lingxiao Jiang SSL 44 104 0 13 Dec 2020
A Comprehensive Study of Deep Video Action Recognition Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong Chongruo Wu Zhi-Li Zhang Joseph Tighe R. Manmatha Mu Li VLM AI4TS 38 185 0 11 Dec 2020