ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.12667
  4. Cited By
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
v1v2v3 (latest)

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Neural Information Processing Systems (NeurIPS), 2019
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
    SSL
ArXiv (abs)PDFHTML

Papers citing "Self-Supervised Learning by Cross-Modal Audio-Video Clustering"

50 / 280 papers shown
Title
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingNeural Information Processing Systems (NeurIPS), 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
576
1,573
0
23 Mar 2022
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via
  Cross-modal Distillation
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal DistillationEuropean Conference on Computer Vision (ECCV), 2022
Antonín Vobecký
David Hurych
Oriane Siméoni
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
3DPC
217
28
0
21 Mar 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention
  and Language
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and LanguageComputer Vision and Pattern Recognition (CVPR), 2022
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
130
54
0
07 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
210
125
0
02 Mar 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech RecognitionInternational Conference on Information Photonics (ICIP), 2022
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
221
12
0
15 Feb 2022
Visual Acoustic Matching
Visual Acoustic MatchingComputer Vision and Pattern Recognition (CVPR), 2022
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
256
65
0
14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference ErasingAAAI Conference on Artificial Intelligence (AAAI), 2022
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
143
30
0
13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak
Junsik Kim
Tae-Hyun Oh
H. Ryu
Dingzeyu Li
In So Kweon
109
1
0
12 Feb 2022
Keyword localisation in untranscribed speech using visually grounded
  speech models
Keyword localisation in untranscribed speech using visually grounded speech modelsIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Kayode Olaleye
Dan Oneaţă
Herman Kamper
148
7
0
02 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
  Detection
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery DetectionComputer Vision and Pattern Recognition (CVPR), 2022
A. Haliassos
Rodrigo Mira
Stavros Petridis
Maja Pantic
CVBM
288
167
0
18 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice QuestionsComputer Vision and Pattern Recognition (CVPR), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
205
120
0
13 Jan 2022
Robust Contrastive Learning against Noisy Views
Robust Contrastive Learning against Noisy ViewsComputer Vision and Pattern Recognition (CVPR), 2022
Ching-Yao Chuang
R. Devon Hjelm
Xin Eric Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Ya-heng Song
NoLa
125
87
0
12 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionInternational Conference on Learning Representations (ICLR), 2022
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
268
404
0
05 Jan 2022
Fine-grained Multi-Modal Self-Supervised Learning
Fine-grained Multi-Modal Self-Supervised LearningBritish Machine Vision Conference (BMVC), 2021
Duo Wang
S. Karout
SSL
102
7
0
22 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual CorrespondenceIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
148
47
0
22 Dec 2021
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video
  Representation
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation
Yujia Zhang
L. Po
Xuyuan Xu
Mengyang Liu
Yexin Wang
Weifeng Ou
Yuzhi Zhao
Weikang Yu
SSLAI4TS
199
18
0
16 Dec 2021
Anomaly Crossing: New Horizons for Video Anomaly Detection as
  Cross-domain Few-shot Learning
Anomaly Crossing: New Horizons for Video Anomaly Detection as Cross-domain Few-shot Learning
Guangyu Sun
Zhangpu Liu
Lianggong Wen
Jing Shi
Chenliang Xu
146
3
0
12 Dec 2021
Contextualized Spatio-Temporal Contrastive Learning with
  Self-Supervision
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
Liangzhe Yuan
Rui Qian
Huayu Chen
Boqing Gong
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
AI4TS
177
17
0
09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
180
7
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
262
152
0
08 Dec 2021
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
179
49
0
08 Dec 2021
Auxiliary Learning for Self-Supervised Video Representation via
  Similarity-based Knowledge Distillation
Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
Amirhossein Dadashzadeh
Alan Whone
Majid Mirmehdi
SSL
270
4
0
07 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation
  Learning
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Srijan Das
Michael S. Ryoo
SSL
250
1
0
07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
  Video Representation Learning
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
140
9
0
07 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSLAI4TS
306
61
0
07 Dec 2021
TCGL: Temporal Contrastive Graph for Self-supervised Video
  Representation Learning
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning
Yang Liu
Keze Wang
Lingbo Liu
Hao Lan
Liang Lin
SSLAI4TS
231
143
0
07 Dec 2021
Self-supervised Video Transformer
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
292
104
0
02 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action
  Segmentation
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
185
28
0
02 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks
Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
Samuel Thomas
Alexander H. Liu
David Harwath
James R. Glass
Hilde Kuehne
M. Shah
SSL
104
5
0
01 Dec 2021
Overcoming the Domain Gap in Contrastive Learning of Neural Action
  Representations
Overcoming the Domain Gap in Contrastive Learning of Neural Action Representations
Semih Günel
Florian Aymanns
S. Honari
Pavan Ramdya
Pascal Fua
SSL
150
0
0
29 Nov 2021
ContIG: Self-supervised Multimodal Contrastive Learning for Medical
  Imaging with Genetics
ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with GeneticsComputer Vision and Pattern Recognition (CVPR), 2021
Aiham Taleb
Matthias Kirchler
Remo Monti
Christoph Lippert
SSLMedIm
350
69
0
26 Nov 2021
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
  3D Scenes
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes
Suhani Vora
Noha Radwan
Klaus Greff
H. Meyer
Kyle Genova
Mehdi S. M. Sajjadi
Etienne Pot
Andrea Tagliasacchi
Daniel Duckworth
280
139
0
25 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2021
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
215
65
0
25 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
177
80
0
24 Nov 2021
Self-Supervised Audio-Visual Representation Learning with Relaxed
  Cross-Modal Synchronicity
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal SynchronicityAAAI Conference on Artificial Intelligence (AAAI), 2021
Pritam Sarkar
Ali Etemad
SSL
288
15
0
09 Nov 2021
Latent Structure Mining with Contrastive Modality Fusion for Multimedia
  Recommendation
Latent Structure Mining with Contrastive Modality Fusion for Multimedia RecommendationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Jinghao Zhang
Yanqiao Zhu
Qiang Liu
Mengqi Zhang
Shu Wu
Liang Wang
226
72
0
01 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIPVLM
259
319
0
21 Oct 2021
Learning 3D Semantic Segmentation with only 2D Image Supervision
Learning 3D Semantic Segmentation with only 2D Image SupervisionInternational Conference on 3D Vision (3DV), 2021
Kyle Genova
Xiaoqi Yin
Abhijit Kundu
C. Pantofaru
Forrester Cole
Avneesh Sud
B. Brewington
B. Shucker
Thomas Funkhouser
3DPC
120
91
0
21 Oct 2021
Constrained Mean Shift for Representation Learning
Constrained Mean Shift for Representation Learning
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
SSL
136
0
0
19 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in
  First Person Action Recognition
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
229
48
0
19 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and
  Challenges
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
Henry Gouk
Chen Change Loy
Timothy M. Hospedales
SSLOODAI4TS
194
334
0
18 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSLAI4TS
131
0
0
13 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual
  Representation Learning
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
114
35
0
11 Oct 2021
Motion-aware Contrastive Video Representation Learning via
  Foreground-background Merging
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding
Maomao Li
Tianyu Yang
Rui Qian
Haohang Xu
Qingyi Chen
Jue Wang
Hongkai Xiong
SSL
220
61
0
30 Sep 2021
Click-through Rate Prediction with Auto-Quantized Contrastive Learning
Click-through Rate Prediction with Auto-Quantized Contrastive Learning
Yujie Pan
Jiangchao Yao
Bo Han
Kunyang Jia
Ya Zhang
Hongxia Yang
MQ
153
19
0
27 Sep 2021
Self-Supervised Video Representation Learning by Video Incoherence
  Detection
Self-Supervised Video Representation Learning by Video Incoherence DetectionIEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Haozhi Cao
Yuecong Xu
Jianfei Yang
K. Mao
Lihua Xie
Jianxiong Yin
Simon See
SSL
104
8
0
26 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation
V-SlowFast Network for Efficient Visual Sound Separation
Xiangjie Sui
Esa Rahtu
206
12
0
18 Sep 2021
Learning Cross-modal Contrastive Features for Video Domain Adaptation
Learning Cross-modal Contrastive Features for Video Domain AdaptationIEEE International Conference on Computer Vision (ICCV), 2021
Donghyun Kim
Yi-Hsuan Tsai
Bingbing Zhuang
Xiang Yu
Stan Sclaroff
Kate Saenko
Manmohan Chandraker
136
83
0
26 Aug 2021
Self-Supervised Video Representation Learning with Meta-Contrastive
  Network
Self-Supervised Video Representation Learning with Meta-Contrastive Network
Yuanze Lin
Xun Guo
Yan Lu
SSL
189
43
0
19 Aug 2021
TrUMAn: Trope Understanding in Movies and Animations
TrUMAn: Trope Understanding in Movies and AnimationsInternational Conference on Information and Knowledge Management (CIKM), 2021
Hung-Ting Su
Po-Wei Shen
Bing-Chen Tsai
Wen-Feng Cheng
Ke-Jyun Wang
Winston H. Hsu
122
6
0
10 Aug 2021
Previous
123456
Next