v1v2v3 (latest)

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Neural Information Processing Systems (NeurIPS), 2019

28 November 2019

Papers citing "Self-Supervised Learning by Cross-Modal Audio-Video Clustering"

50 / 280 papers shown

Title
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingNeural Information Processing Systems (NeurIPS), 2022 Zhan Tong Yibing Song Jue Wang Limin Wang ViT 576 1,573 0 23 Mar 2022
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal DistillationEuropean Conference on Computer Vision (ECCV), 2022 Antonín Vobecký David Hurych Oriane Siméoni Spyros Gidaris Andrei Bursuc Patrick Pérez Josef Sivic 3DPC 217 28 0 21 Mar 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and LanguageComputer Vision and Pattern Recognition (CVPR), 2022 Otniel-Bogdan Mercea Lukas Riesch A. Sophia Koepke Zeynep Akata 130 54 0 07 Mar 2022
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022 Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 210 125 0 02 Mar 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech RecognitionInternational Conference on Information Photonics (ICIP), 2022 Zitian Zhang Jie Zhang Jian-Shu Zhang Ming Wu Xin Fang Lirong Dai SSL 221 12 0 15 Feb 2022
Visual Acoustic MatchingComputer Vision and Pattern Recognition (CVPR), 2022 Changan Chen Ruohan Gao P. Calamia Kristen Grauman 256 65 0 14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference ErasingAAAI Conference on Artificial Intelligence (AAAI), 2022 Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 143 30 0 13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition Arda Senocak Junsik Kim Tae-Hyun Oh H. Ryu Dingzeyu Li In So Kweon 109 1 0 12 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech modelsIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022 Kayode Olaleye Dan Oneaţă Herman Kamper 148 7 0 02 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery DetectionComputer Vision and Pattern Recognition (CVPR), 2022 A. Haliassos Rodrigo Mira Stavros Petridis Maja Pantic CVBM 288 167 0 18 Jan 2022
Bridging Video-text Retrieval with Multiple Choice QuestionsComputer Vision and Pattern Recognition (CVPR), 2022 Yuying Ge Yixiao Ge Xihui Liu Dian Li Ying Shan Xiaohu Qie Ping Luo BDL 205 120 0 13 Jan 2022
Robust Contrastive Learning against Noisy ViewsComputer Vision and Pattern Recognition (CVPR), 2022 Ching-Yao Chuang R. Devon Hjelm Xin Eric Wang Vibhav Vineet Neel Joshi Antonio Torralba Stefanie Jegelka Ya-heng Song NoLa 125 87 0 12 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionInternational Conference on Learning Representations (ICLR), 2022 Bowen Shi Wei-Ning Hsu Kushal Lakhotia Abdel-rahman Mohamed SSL 268 404 0 05 Jan 2022
Fine-grained Multi-Modal Self-Supervised LearningBritish Machine Vision Conference (BMVC), 2021 Duo Wang S. Karout SSL 102 7 0 22 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual CorrespondenceIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021 Di Hu Yake Wei Rui Qian Weiyao Lin Ruihua Song Ji-Rong Wen 148 47 0 22 Dec 2021
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation Yujia Zhang L. Po Xuyuan Xu Mengyang Liu Yexin Wang Weifeng Ou Yuzhi Zhao Weikang Yu SSL AI4TS 199 18 0 16 Dec 2021
Anomaly Crossing: New Horizons for Video Anomaly Detection as Cross-domain Few-shot Learning Guangyu Sun Zhangpu Liu Lianggong Wen Jing Shi Chenliang Xu 146 3 0 12 Dec 2021
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision Liangzhe Yuan Rui Qian Huayu Chen Boqing Gong Florian Schroff Ming-Hsuan Yang Hartwig Adam Ting Liu AI4TS 177 17 0 09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning Rui Qian Yeqing Li Liangzhe Yuan Boqing Gong Ting Liu Matthew A. Brown Serge Belongie Ming-Hsuan Yang Hartwig Adam Huayu Chen AI4TS 180 7 0 08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Nina Shvetsova Brian Chen Andrew Rouditchenko Samuel Thomas Brian Kingsbury Rogerio Feris David Harwath James R. Glass Hilde Kuehne ViT 262 152 0 08 Dec 2021
Audio-Visual Synchronisation in the wild Honglie Chen Weidi Xie Triantafyllos Afouras Arsha Nagrani Andrea Vedaldi Andrew Zisserman 179 49 0 08 Dec 2021
Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation Amirhossein Dadashzadeh Alan Whone Majid Mirmehdi SSL 270 4 0 07 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning Srijan Das Michael S. Ryoo SSL 250 1 0 07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning Manlin Zhang Jinpeng Wang A. J. Ma 140 9 0 07 Dec 2021
Time-Equivariant Contrastive Video Representation Learning Simon Jenni Hailin Jin SSL AI4TS 306 61 0 07 Dec 2021
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning Yang Liu Keze Wang Lingbo Liu Hao Lan Liang Lin SSL AI4TS 231 143 0 07 Dec 2021
Self-supervised Video Transformer Kanchana Ranasinghe Muzammal Naseer Salman Khan Fahad Shahbaz Khan Michael S. Ryoo ViT 292 104 0 02 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation Dipika Singhania R. Rahaman Angela Yao 185 28 0 02 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks Kevin Duarte Brian Chen Nina Shvetsova Andrew Rouditchenko Samuel Thomas Alexander H. Liu David Harwath James R. Glass Hilde Kuehne M. Shah SSL 104 5 0 01 Dec 2021
Overcoming the Domain Gap in Contrastive Learning of Neural Action Representations Semih Günel Florian Aymanns S. Honari Pavan Ramdya Pascal Fua SSL 150 0 0 29 Nov 2021
ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with GeneticsComputer Vision and Pattern Recognition (CVPR), 2021 Aiham Taleb Matthias Kirchler Remo Monti Christoph Lippert SSL MedIm 350 69 0 26 Nov 2021
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes Suhani Vora Noha Radwan Klaus Greff H. Meyer Kyle Genova Mehdi S. M. Sajjadi Etienne Pot Andrea Tagliasacchi Daniel Duckworth 280 139 0 25 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2021 Junfei Xiao Longlong Jing Lin Zhang Ju He Qi She Zongwei Zhou Alan Yuille Yingwei Li 215 65 0 25 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing Jiashuo Yu Ying Cheng Ruiwei Zhao Rui Feng Yuejie Zhang 177 80 0 24 Nov 2021
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal SynchronicityAAAI Conference on Artificial Intelligence (AAAI), 2021 Pritam Sarkar Ali Etemad SSL 288 15 0 09 Nov 2021
Latent Structure Mining with Contrastive Modality Fusion for Multimedia RecommendationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2021 Jinghao Zhang Yanqiao Zhu Qiang Liu Mengqi Zhang Shu Wu Liang Wang 226 72 0 01 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Ho-Hsiang Wu Prem Seetharaman Kundan Kumar J. P. Bello CLIP VLM 259 319 0 21 Oct 2021
Learning 3D Semantic Segmentation with only 2D Image SupervisionInternational Conference on 3D Vision (3DV), 2021 Kyle Genova Xiaoqi Yin Abhijit Kundu C. Pantofaru Forrester Cole Avneesh Sud B. Brewington B. Shucker Thomas Funkhouser 3DPC 120 91 0 21 Oct 2021
Constrained Mean Shift for Representation Learning Ajinkya Tejankar Soroush Abbasi Koohpayegani Hamed Pirsiavash SSL 136 0 0 19 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition M. Planamente Chiara Plizzari Emanuele Alberti Barbara Caputo EgoV 229 48 0 19 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and Challenges Linus Ericsson Henry Gouk Chen Change Loy Timothy M. Hospedales SSL OOD AI4TS 194 334 0 18 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 131 0 0 13 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning Chongjian Ge Youwei Liang Yibing Song Jianbo Jiao Jue Wang Ping Luo ViT 114 35 0 11 Oct 2021
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging Shuangrui Ding Maomao Li Tianyu Yang Rui Qian Haohang Xu Qingyi Chen Jue Wang Hongkai Xiong SSL 220 61 0 30 Sep 2021
Click-through Rate Prediction with Auto-Quantized Contrastive Learning Yujie Pan Jiangchao Yao Bo Han Kunyang Jia Ya Zhang Hongxia Yang MQ 153 19 0 27 Sep 2021
Self-Supervised Video Representation Learning by Video Incoherence DetectionIEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021 Haozhi Cao Yuecong Xu Jianfei Yang K. Mao Lihua Xie Jianxiong Yin Simon See SSL 104 8 0 26 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation Xiangjie Sui Esa Rahtu 206 12 0 18 Sep 2021
Learning Cross-modal Contrastive Features for Video Domain AdaptationIEEE International Conference on Computer Vision (ICCV), 2021 Donghyun Kim Yi-Hsuan Tsai Bingbing Zhuang Xiang Yu Stan Sclaroff Kate Saenko Manmohan Chandraker 136 83 0 26 Aug 2021
Self-Supervised Video Representation Learning with Meta-Contrastive Network Yuanze Lin Xun Guo Yan Lu SSL 189 43 0 19 Aug 2021
TrUMAn: Trope Understanding in Movies and AnimationsInternational Conference on Information and Knowledge Management (CIKM), 2021 Hung-Ting Su Po-Wei Shen Bing-Chen Tsai Wen-Feng Cheng Ke-Jyun Wang Winston H. Hsu 122 6 0 10 Aug 2021